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I . Real Party in Interest 

The real party in interest of this Appeal is the 
Assignee of the above-referenced patent application, 
diaDexus, Inc. 
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II . Related Appeals and Interferences 

The appellant, the appellant's legal representative, 
and the assignees are not aware of any other appeals or 
interferences which will directly affect or be directly 
affected by or have a bearing on the Board's decision in 
the instant appeal. 
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Ill . Status of the Claims 

Claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and 13 
are canceled. 

Claim 14 is rejected and is the subject of appeal. 

Claims 15, 16, 17, 18, 19 and 20 are canceled. 

Claims 21, 22, 23, 24, 25, 26, 27, 28, 
25, 26, 27, and 28 are rejected and are the subject of 
appeal . 

Claims 29, 30, 31, 32, 33 and 34 are canceled. 
Claims 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
47, 48 and 49 are rejected and are the subject of appeal. 
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IV. Status of Amendments 

All amendments have been entered. 
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V. Summary of the Claimed Subject Matter 

The claimed subject matter relates to isolated 
antibodies or antibody fragments that bind to a protein 
referred to by Appellant as OvrllO. More specifically, the 
claims are drawn to isolated antibodies or antibody 
fragments that bind specifically to a protein encoded by 
polynucleotide sequence SEQ ID N0:1 or to a fragment of the 
protein encoded by SEQ ID N0:1 which is encoded by SEQ ID 
NO: 12 or 13. Methods for binding these antibodies on a 
cell are also claimed. 

The polynucleotide sequence of SEQ ID N0:1 and 
fragments SEQ ID NO: 12 and 13 are set forth in the Sequence 
Listing. The polynucleotide sequence of SEQ ID N0:1 is 
inclusive of the entire open reading frame for the protein. 
Further, the specification teaches in Examples 1 and 2 at 
pages 16-18 that SEQ ID N0:1 is an mRNA molecule and thus 
has a set 5 1 to 3 1 orientation. 

Antibodies and antibody fragments against Cancer 
Specific Genes such as SEQ ID N0:1 and fragments thereof 
such as SEQ ID NO: 12 and 13 as well as methods for use of 
these antibodies are described in detail in the 
specification, for example at pages 11-12 and 14-15. 
Teachings in Examples 1 and 2 relating to mRNA 
overexpression of OvrllO (SEQ ID N0:1) provide further 
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evidence of the utility of this Cancer Specific Gene 
diagnostic marker for gynecologic cancers. 
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VI. Grounds of Rejections to be Reviewed on Appeal 

Whether claims 14, 21, 22, 23, 24, 25, 26, 27, 28, 35, 
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 and 49 
meet the utility requirement of 35 U.S.C. 101. 

Whether claims 14, 21, 22, 23, 24, 25, 26, 27, 28, 35, 

36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 and 49 

meet the enablement requirement of 35 U.S.C. 112, first 
paragraph. 

Whether claims 14, 21, 22, 23, 24, 25, 26, 27, 28, 35, 

36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 and 49 
meet the written description requirement of 35 U.S.C. 112, 
first paragraph. 
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VI I . Arguments 

Claims of the instant application are drawn to 
isolated antibodies or antibody fragments that bind 
specifically to a protein encoded by a polynucleotide 
sequence SEQ ID N0:1, or a fragment of the protein encoded 
by polynucleotide sequence SEQ ID N0:1, wherein the 
fragment is encoded by polynucleotide sequence SEQ ID NO: 12 
or 13. Also claimed are methods for binding these 
antibodies or antibody fragments on a cell by contacting 
the cell with the isolated antibody or antibody fragment. 
The pending claims stand rejected under 35 U.S.C. 101 and 
35 U.S.C. 112, first paragraph, as not being supported by 
either a substantial utility or a well established utility 
has been maintained. The pending claims also stand 
rejected under 35 U.S.C, 112, first paragraph for failing 
to meet the written description requirement. The 
underlying question in each rejection raised by the 
Examiner is whether express written disclosure in the 
specification of an amino acid sequence for a protein 
encoded by an expressly disclosed full length nucleic acid, 
in this case SEQ ID NO:l, is required to meet the utility 
and enablement requirements as set forth in 35 U.S.C. 101 
and 112, first paragraph, as well as the written 
description requirement set forth in 35 U.S.C. 112, first 
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paragraph, for a claim drawn to an isolated antibody or 
antibody fragment that binds specifically to a protein 
encoded by polynucleotide sequence SEQ ID N0:1. 

Appellant respectfully submits that in the instant 
patent application, with the instant facts, express written 
disclosure in the specification of an amino acid sequence 
for a protein encoded by an expressly disclosed full length 
nucleic acid, in this case SEQ ID N0:1, is not required to 
meet the statutory requirements of 35 U.S.C. 101 and 35 
U.S.C. 112, first paragraph. 
A. Rejection under 35 U.S.C. 101 

With respect to the rejection of the pending claims 
under 35 U.S.C. 101, MPEP 2107.02 and the case law are 
clear, 

as a matter of Patent Office practice, a specification 
which contains a disclosure of utility which 
corresponds in scope to the subject matter sought to 
be patented must be taken as sufficient to satisfy the 
utility requirement of §101 for the entire claimed 
subject matter unless there is reason for one skilled 
in the art to question the objective truth of the 
statement of utility or its scope. 
In re Langer, 503 F.2d 1380, 1391, 183 USPQ 228, 297 (CCPA 

1974) (emphasis in original) . 

The originally filed specification contains a 
disclosure of utility which corresponds in scope to the 
claimed subject matter. Specifically, at page 3, lines 26 
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through page 4, line 2, as well as page 7 , lines 2-35, of 
the specification, it is taught that nine Cancer Specific 
Genes (CSGs) have been identified and refer, among other 
things, to native proteins expressed by the genes 
comprising the polynucleotide sequences of any of SEQ ID 
NO: 1, 2, 3, 4, 5, 6, 7, 8 or 9, the native mRNAs encoded 
by the genes comprising any of the polynucleotide sequences 
of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8 or 9 or the actual 
genes comprising any of the polynucleotide sequences of SEQ 
ID NO: 1, 2, 3, 4, 5, 6, 7, 8 or 9. It is also taught that 
fragments of the CSGs such as those depicted in SEQ ID 
NO: 10, 11, 12, 13 or 14 can also be detected. Further, at 
page 6, lines 3 through 20, of the specification it is 
stated that: 

antibodies against CSG or fragments of such antibodies 
which can be used to detect or image localization of 
CSG in a patient for the purpose of detecting or 
diagnosing selected cancers. Such antibodies can be 
polyclonal or monoclonal, or prepared by molecular 
biology techniques. The term "antibody", as used 
herein and throughout the instant specification is 
also meant to include aptamers and single-stranded 
oligonucleotides such as those derived from an in 
vitro evolution protocol referred to as SELEX and well 
known to those skilled in the art. Antibodies can be 
labeled with a variety of detectable labels including, 
but not limited to, radioisotopes and paramagnetic 
metals. These antibodies or fragments thereof can 
also be used as therapeutic agents in the treatment of 
diseases characterized by expression of a CSG. In 
therapeutic applications, the antibody can be used 
without or with derivatization to a cytotoxic agent 
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such as a radioisotope, enzyme, toxin, drug or a 
prodrug. 

In addition, at page 11, line 5 through page 12, line 7 of 
the instant specification, assay techniques that can be used 
to determine levels of a CSG of the present invention, in a 
sample derived from a patient are described. Included in 
the assays taught in the specification are 

radioimmunoassays, immunohistochemistry assays, competition 

assays, Western Blot analyses and ELISA assays, all of 

which are well known to those of skill in the art and 

involve antibodies to the CSG. ELISA and competition assays 

are described in detail at page 11, line 16 through page 

12, line 7 and page 12, lines 8 through 29, respectively, 

and each assay is explicitly stated to require an antibody 

specific to CSG. In vivo antibody uses are also taught in 

the specification at page 14, line 5 through page 15, line 

27 in a subsection of the specification entitled "In Vivo 

Antibody Use". Therein is it stated that: 

Antibodies against CSG can also be used in vivo in 
patients suspected of suffering from a selected cancer 
including lung cancer or gynecologic cancers such as 
ovarian, breast, endometrial or uterine cancer [and 
that] antibodies against a CSG can be injected into a 
patient suspected of having a selected cancer for 
diagnostic and/or therapeutic purposes. 

It is further stated that "use of antibodies for in vivo 

diagnosis is well known in the art" and several examples of 
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antibodies used as in vivo diagnostics in cancer are 
provided as evidence to support this statement. In 
addition, details on administering antibodies against a CSG 
for the purpose of diagnosing or staging of the disease 
status of the patient are set forth as well as injection of 
an antibody against a CSG for therapeutic benefit. Again, 
several examples of antibodies used therapeutically in 
cancer are provided as evidence of the credibility of the 
substantial asserted utility of the instant claimed 
invention. 

Finally, Examples 1 and 2 set forth at pages 16-18 of 
the specification describe experiments relating to mRNA 
overexpression of SEQ ID N0:1, also referred to as OvrllO, 
demonstrative to the skilled artisan of its utility as a 
diagnostic marker for gynecologic cancers. 

Thus, teachings of the original, as-filed 
specification clearly assert a substantial utility for the 
claimed invention . 

Further, Appellant provided with the Office Action 
response filed May 3, 2005 confirming evidence that the 
claimed invention is useful in the manner taught in the 
originally filed application. Appellant provided two 
publications confirming teachings in the originally filed 
specification that elevated mRNA expression of OvrllO (SEQ 
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ID N0:1) in gynecologic cancer tissues as taught at page 
17-24 of the specification correlates with measurable 
protein levels in gynecologic cancers. 

Specifically, Tringler et al. published results from a 
study designed to investigate the expression of the DDO110 
protein, also known as OvrllO (the protein encoded by SEQ 
ID NO:l), which is homologous to B7-H4 (see page 1842, col. 
2 of Tringler et al.), in normal breast and in primary and 
metastatic breast carcinoma in the March 1, 2005 issue of 
Clinical Cancer Research. A copy of this reference is 
provided herewith as Evidence Appendix A. OvrllO protein 
exhibited nearly ubiquitous expression in breast cancer, 
independent of tumor grade or stage and is suggested to 
have a critical role in breast cancer biology. At page 
1842, col. 2, Tringler et al. teaches that this gynecologic 
cancer marker was initially identified and characterized 
via quantitative PCR analysis (such as set forth in the 
instant specification at pages 17-24). Experiments and 
results set forth at pages 1843-1845 of Tringler et al. 
confirm the substantial asserted utility of an antibody of 
the claimed invention in detecting overexpression of OvrllO 
in cancer tissues in accordance with methods such as taught 
at page 11-15 of the originally filed specification. 
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Also provided by Appellant with the Office Action response 
filed May 3, 2005 was a reference by Salceda et al. 
available publicly online on March 9, 2005 and published in 
the May 15, 2005 issue of Experimental Cell Research, 
Volume 306, number 1 at pages 128-141. A copy of this 
reference is provided herewith as Evidence Appendix B. In 
the results section at page 132 (col. 2), Salceda et al. 
teach that the detection of OvrllO protein (the protein 
encoded by SEQ ID NO:l) (referred to therein as B7-H4 or 
DDO110) ) "in human breast and ovarian cancers but not in 
most normal adult tissues by Western blot is in good 
agreement with mRNA expression data." Further, in the 
Discussion at page 139, it is taught that "B7-H4 mRNA was 
overexpressed in serous ovarian cancer and a majority of 
breast cancers with little or no expression in a variety of 
normal tissues surveyed" and that "Western blots with a 
monoclonal antibody against B7-H4 showed that B7-H4 protein 
expression reflected this mRNA distribution" . 

MPEP 2107. 02B teaches where an applicant has 
specifically asserted that an invention has a particular 
utility, the assertion cannot simply be dismissed by Office 
personnel as being "wrong" even when there may be reason to 
believe that the assertion is not entirely accurate. 
Rather, Office personnel must determine if the assertion of 
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utility is credible (i.e. whether the assertion of utility 
is believable to a person of ordinary skill in the art 
based on the totality of evidence and reasoning provided) . 
An assertion is credible unless (A) the logic underlying 
the assertion is seriously flawed, or (B) facts upon which 
the assertion is based are inconsistent with the logic 
underlying the assertion. 

For the instant invention, the logic underlying the 
asserted utility is clearly not flawed. Nor are the facts 
upon which the assertion is based inconsistent with the 
logic underlying the assertion. Instead, the asserted 
utility has been confirmed in publications by Tringler et 
al. and Salceda et al. Accordingly, the asserted utility 
of the instant invention must be credible. 

The Court in In re Rinehart, 531 F.2d 1048, 1052, 189 
USPQ 143, 147 (CCPA 1976), held that "[w]hen the record as 
a whole would make it more likely than not that the 
asserted utility for the claimed invention would be 
considered credible by a person of ordinary skill in the 
art, the Office cannot maintain the rejection. 
Accordingly, Appellant submitted with their Reply mailed 
April 22, 2008, a Declaration by Dr. Patrick Sluss. A copy 
of Dr. Sluss 1 Declaration is provided herewith as Evidence 
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Appendix C. Paragraphs 1 through 3 of Dr. Sluss' 
Declaration make clear that he is one of skill in this art. 
After review of the instant application and in particular 
data presented in Examples 1 and 2 of the patent 
application relating to mRNA overexpression of OvrllO, Dr. 
Sluss believed OvrllO to be useful as a diagnostic marker 
for gynecologic cancers. See specifically paragraph 5 of 
Dr. Sluss 1 Declaration. Thus, Appellant has also provided 
evidence that the asserted utility for the claimed 
invention is considered credible by a person of ordinary 
skill in the art. 

In contrast, Office personnel have failed to provide 
evidence sufficient to show that the statement of asserted 
utility would be considered "false" by a person of ordinary 
skill in the art as required by MPEP 2107.02. While 
several literature references were cited by Office 
personnel in the Office Action mailed January 3, 2005 in 
support of the suggestion that steady state levels of mRNA 
do not necessarily correlate with steady state levels of 
proteins, none of the references are related to the protein 
encoded by SEQ ID NO:l. Instead, these references report 
unique findings of scientific interest wherein researchers 
unexpectedly found that protein and mRNA levels did not 
always correlate for a unique group of proteins. Copies of 
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the references cited in the January 3, 2005 Office Action 
are provided herewith for convenience in Evidence Appendix 
D. However, these references are not representative of the 
art for proteins in general wherein mRNA levels correlate 
quite well with protein levels. 

Appellant has shown multiple places in the originally 
filed specification wherein a specific utility for the 
claimed invention is asserted. Appellant has provided 
confirming evidence via published references of this 
utility. Finally, Appellant submitted a Declaration by one 
skilled in the art stating that the data of Examples 1 and 
2 of the patent application demonstrates "the utility of 
OvrllO as a diagnostic marker for gynecologic cancers." 
Accordingly, the evidence as a whole, makes clear that the 
asserted utility for the claimed invention is credible and 
further maintenance of any rejection under 35 U.S.C. 101 is 
improper. See In re Rinehart, 531 F.2d 1048, 1052, 189 USPQ 
143, 147 (CCPA 1976) . 

B. Enablement Rejection under 35 U.S.C. 112, first 
paragraph 

With respect to the rejection of the pending claims 
under 35 U.S.C. 112, for lack of enablement, the test of 
enablement is whether one reasonably skilled in the art 
could make or use the claimed invention from the 
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disclosures in the patent coupled with information known in 
the art without undue experimentation. See MPEP 2164.01. 
Thus, the test of enablement is not whether any 
experimentation is necessary but whether, if 
experimentation is necessary it is undue. In re Angstadt, 
537 F.2d 498, 504, 190 USPQ 214, 219 (CCPA 1976). If the 
art typically engages in such experimentation, it is not 
considered undue. See In re Certain Limited-Charge Cell 
Culture Microcarriers, 221 USPQ 1165, 1174 ((Int'l Trade 
Comm'n 1983), aff'd sub nom. , Massachusetts Institute of 
Technology v. A.B. Fortia, 774 F.2d 1104, 227 USPQ 428 
(Fed. Cir. 1985) . Further, information well known in the 
art does not need to be described in detail in the 
specification. MPEP 2163 at page 2100-170 and Hybritech, 
Inc. v. Monoclonal Antibodies, Inc., 802 F.2d 1367, 1379- 
80, 231 USPQ 81, 90 (Fed. Cir. 1986). 

Detailed guidance for the skilled artisan to make and 
use the claimed invention is provided in teachings 
throughout the specification. Teachings in Examples 1 and 
2 relating to mRNA overexpression of OvrllO (SEQ ID NO:l) 
are demonstrative to the skilled artisan of its utility as 
a diagnostic marker for gynecologic cancers. Uses for the 
protein encoded by SEQ ID NO:l and antibodies against 
Cancer Specific Genes such as SEQ ID NO:l are described in 
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detail in the specification, for example at pages 11-12 and 
14-15 of the instant application. 

Further, Appellant submitted with their Reply filed 
November 22, 2005 a Declaration by inventor Dr. Susana 
Salceda which makes clear that any additional tools 
necessary to make and use the invention as claimed were 
well known and were used routinely by those the skilled 
artisan with information such as taught in the 
specification. As copy of Dr. Salceda 's Declaration is 
provided herewith in Evidence Appendix E. As discussed in 
detail in paragraph 6 of Dr. Salceda 1 s Declaration, protein 
sequences and/or open reading frames were routinely 
obtained by those skilled in the art at the time of filing 
the instant patent application based upon information such 
as provided in the instant specification. In particular, 
the specification teaches in Examples 1 and 2 at pages 16- 
18 that SEQ ID NO:l is an mRNA molecule and thus has a set 
5' to 3 1 orientation. From this information, one skilled 
in the art knows that the protein is encoded in the forward 
(5' to 3') direction of SEQ ID NO:l. This characteristic 
taught in the originally filed specification limits the 
potential frame translations to three possibilities. 
Further, as explained in paragraph 6 of Dr. Salceda f s 
Declaration, one skilled in the art understands that in 
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general the open reading frame is: the frame of SEQ ID N0:1 
encoding for a methionine near the 5' end; the frame 
encoding many amino acids; and, the frame terminating with 
a stop codon. Any frame with multiple stop codons can thus 
be ruled out. Multiple tools were available by 1998, thus 
preceding the September 2, 1998 priority date of the 
instant application, which could be used to routinely 
determine the protein sequence and/or open reading frame of 
SEQ ID N0:1 based upon the information provided in the 
originally filed specification. Provided with Dr. 
Salceda's Declaration are examples of results from three 
different computer programs available to those skilled in 
the art as of the filing date of the instant application. 
These examples are also provided in Evidence Appendix E. 
Quite clear from these examples is the fact that for this 
particular nucleic acid sequence, SEQ ID N0:1, there was 
only one possible frame for a full length protein, frame 2, 
with a methionine near the 5' end, encoding a protein of 
over 200 amino acids in length and terminating with a stop 
codon. Thus, as shown by the Figures of Dr. Salceda's 
Declaration, the results of which are described in detail 
in paragraph 6 of Dr. Salceda's Declaration, using only the 
information disclosed in the instant specification, each of 
these programs was able to identify the open reading frame 
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and protein encoded by SEQ ID N0:1. This simple step 
required to identify the open reading frame and protein 
encoded by SEQ ID N0:1 using the characteristics of SEQ ID 
N0:1 taught in the instant specification does not 
constitute undue experimentation. 

As additional evidence that one of skill in the art 
would know there was only one possible frame for a full 
length protein, frame 2, in SEQ ID N0:1, Appellant provided 
with their Reply mailed April 26, 2006 references 
evidencing that, in general, the sequence flanking 
functional initiator codons in eukaryotic mRNA sequences is 
a nonrandom sequence, referred to as the Kozak consensus 
sequence. Also provided were multiple references 
evidencing that it was known that, in general, the 5'- 
proximal ATG serves as the initiator codon for the majority 
of mRNAs . Copies of these references are provided herewith 
in Evidence Appendix F. 

The Declaration by skilled artisan Dr. Patrick Sluss 
(see Evidence Appendix C) serves as yet additional evidence 
that those skilled in the art as of 1998 typically engaged 
in experimentation such as required in the instant 
application to identify antibodies binding to the protein 
encoded by SEQ ID NO:l. In paragraph 7, Dr. Sluss states 
"once the nucleic acid sequence is specified there were 
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several approaches available to those skilled in the art in 
1998 to generate antibodies that could be used to formulate 
tests for circulating proteins originating from the nucleic 
acid sequence revealed." Further in paragraph 8, Dr. Sluss 
states "[t]he nucleic acid sequences contain all the 
information needed for one skilled in the art to predict, 
using software tools available in 1998, all proteins that 
could be coded. These protein sequences could then be used 
in homology searches, again using software and databases 
available at the time, to identify target immunogens for 
specific antibody generation." Clear from Dr. Sluss' 
Declaration is once the nucleic acid sequence of SEQ ID 
N0:1 had been identified as an ovary specific gene 
associated with ovarian cancer, obtaining antibodies to a 
protein encoded thereby useful in an antibody-based 
diagnostic method was routine. 

Also evidenced by both Declarations is that well known 
and routine to those of skill in the art at the time of 
filing the instant application were methods for expressing 
proteins encoded by a nucleotide sequence such as SEQ ID 
N0:1 and generating antibodies thereto. See paragraph 8 of 
Dr. Salceda's Declaration and paragraph 7 of Dr. Sluss 1 
Declaration . 
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Thus, evidenced by the record as a whole is that 
teachings of the instant specification provide adequate 
disclosure when coupled with information known to those 
skilled in the art to make and use the invention as claimed 
without undue experimentation. Not only is the native 
protein encoded by the longest open reading frame (ORF) of 
SEQ ID N0:1, but the ORF begins with the functional protein 
transcription initiator codon commonly referred to as the 
Kozak consensus sequence. Further, the ORF begins at the 
5 '-proximal ATG in SEQ ID N0:1, the initiator codon for the 
majority of mRNAs . Both the Kozak consensus sequence and 
5 '-proximal ATG are well-known characteristics of the 
coding sequence of nucleic acids and therefore need not be 
expressly outlined in the specification. The disclosed 
structures and features of SEQ ID N0:1, coupled with the 
tools available at the time of filing the instant 
application to identify the open reading frame in a nucleic 
acid sequence which are outlined in Dr. Salceda's and Dr. 
Sluss 1 Declarations, clearly provide sufficient information 
to enable one of skill in the art to routinely make and use 
the instant claimed invention without undue 

experimentation, thus meeting the requirements of 35 U.S.C. 
112, first paragraph, with respect to the enablement. 
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Further maintenance of this enablement rejection is 
therefore improper . 

C. Written Description Rejection under 35 U.S.C. 112, 
first paragraph 

Finally, with respect to the rejection of the pending 
claims under 35 U.S.C. 112, for written description, it is 
respectfully submitted that the disclosure not only 
distinguishes the claimed invention from other materials 
but also leads one of skill in the art to a conclusion that 
the inventors were in possession of the claimed species. 
In the instant application, Appellant provided in the 
originally filed specification the nucleic acid sequences 
for polynucleotide SEQ ID N0:1, and multiple fragments 
thereof including SEQ ID NO: 12 and 13. Further, Appellant 
taught in the originally filed specification that SEQ ID 
N0:1 is an mRNA, thus establishing its 5 1 to 3' 
orientation. Polynucleotide SEQ ID N0:1 is inclusive of 
the entire open reading frame for the protein encoded 
thereby. In addition, polynucleotide SEQ ID N0:1 includes 
a Kozak consensus sequence, well established in the art as 
a sequence flanking functional initiator codons in the 
majority of eukaryotic mRNA sequences. Further, this Kozak 
consensus sequence flanks the 5 '-proximal ATG, well known 
in the art to serve as the initiator codon for the majority 
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of mRNAs . Thus, the nucleic acid sequence taught for 
polynucleotide SEQ ID N0:1 in the originally filed 
specification includes the classic structural 
characteristics well established in the art to correlate 
with an open reading frame of a nucleic acid sequence 
encoding the protein. The instant application also 
includes a description of various methods for making the 
claimed antibodies and methods for using the antibodies. 
Thus, the instant specification meets both policy 
objectives of the written description requirement. See 
MPEP 2163 and In re Barker, 559 F.2d 588, 592 n.4, 194 USPQ 
470, 473 n.4 (CCPA 1977) and Regents of the University of 
California v. Eli Lilly, 119 F.3d 1559, 1566, 43 USPQ2d 
1398, 1404 (Fed. Cir. 1997), cert, denied, 523 U.S. 1089 
(1998) . 

Further, MPEP 2163 states that "in the molecular 
biology arts, if an applicant disclosed an amino acid 
sequence, it would be unnecessary to provide an explicit 
disclosure of nucleic acid sequences that encoded the amino 
acid sequence. Since the genetic code is widely known, a 
disclosure of an amino acid sequence would provide 
sufficient information such that one would accept that an 
applicant was in possession of the full genus of nucleic 
acids encoding a given amino acid sequence, but not 
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necessarily any particular species. Cf. In re Bell, 991 
F.2d 781, 785, 26 USPQ2d 1529, 1532 (Fed. Cir. 1993) and In 
re Baird, 16 F.3d 380, 382, 29 USPQ2d 1550, 1552 (Fed. Cir. 
1994)." This acknowledgement by the USPTO that "it would 
be unnecessary to provide an explicit disclosure of nucleic 
acid sequences that encoded the amino acid sequence" 
(emphasis added) is made for determining nucleic acid 
sequences from an amino acid sequence where it is well 
known that degeneracy of the genetic code will result in 
multiple nucleic acid sequences. Since an explicit 
disclosure is not required to meet written description 
under these facts, Appellant believes it is improper to 
require explicit disclosure in the instant application, 
where the genetic code has been acknowledged to be widely 
known and degeneracy plays no factor whatsoever in 
determining an amino sequence that is encoded by a 
disclosed nucleic acid sequence. 

The Examiner relies upon Fiers v. Revel, 25 USPQ2d 
1601 and Amgen v. Chugai Pharmaceutical Co. Ltd. 18 USPQ 
1016 to suggest that adequate written description requires 
more than a mere statement that it is part of the invention 
and reference to a potential method of isolation; the 
compound itself it required. However, the facts in those 
cases are very different to those herein. In those cases, 
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the claims were drawn to nucleic acid sequences for which 
no sequence information whatsoever was set forth in the 
patent application . 

More relevant to the instant application are more 
recent decisions from the Court of Appeals for the Federal 
Circuit such as Falkner v. Inglis, 448 F.3d 1357, 1366, 79 
USPQ2d 1001, 1007 (Fed. Cir. 2006) wherein the Federal 
Circuit explained that, "(1) examples are not necessary to 
support the adequacy of a written description; (2) the 
written description standard may be met . . . even where 
actual reduction to practice of an invention is absent; and 
(3) there is no per se rule that an adequate written 
description of an invention that involves a biological 
macromolecule must contain a recitation of known structure" 
and Capon v. Eshhar, 418 F.3d 1349, 1357, 76 USPQ2d 1078, 
1085 (Fed. Cir. 2005) wherein the Court state that "The 
'written description 1 requirement must be applied in the 
context of the particular invention and the state of the 
knowledge... As each field evolves, the balance also 
evolves between what is known and what is added by each 
inventive contribution." Also see MPEP 2163. 

The genetic code, coupled with reasonable 
predictability associated with a proximal ATG and Kozak 
sequences, establishes a strong structural correlation 
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between a nucleic acid sequence and a protein encoded 
thereby. A strong structural correlation between an 
encoded protein and antibodies raised thereto is also well- 
established and has been recognized by the Courts in 
Hybritech, Inc. v. Monoclonal Antibodies , Inc., 802 F.2d 
1367, 1384, 231 USPQ 81, 94 (Fed. Cir. 1986) , cert, 
denied, 480 U.S. 947 (1987), cert, denied, 480 U.S. 947 
(1987) . Further, Appellant discloses in the specification 
that antibodies raised against a protein encoded by the 
disclosed nucleic acid sequence, SEQ ID NO:l, are useful in 
detecting gynecologic cancers and lung cancer. Evidence 
confirming the disclosed utility has been submitted. 
Accordingly, one skilled in the art would be able to 
predict with a reasonable degree of confidence the 
structure of the claimed invention from a recitation of its 
function. Express disclosure of the structure of the 
protein or antibodies thereto is therefore not required in 
the instant application to meet the written description 
requirement of 35 U.S.C. 112. 

Finally, MPEP 2163 states "[i]n most technologies 
which are mature, and wherein the knowledge and level of 
skill in the art is high, a written description question 
should not be raised for claims present in the application 
when originally filed, even if the specification discloses 
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only a method of making the invention and the function of 
the invention. See, e.g., In re Hayes Microcomputer 
Products, Inc. Patent Litigation, 982 F.2d 1527, 1534-35, 
25 USPQ2d 1241, 1246 (Fed. Cir. 1992) . In 1986, the Court 
of Appeals for the Federal Circuit found that raising 
monoclonal antibodies is conventional or well known to one 
of ordinary skill in the art and need not be disclosed in 
detail. Hybritech Inc. v. Monoclonal Antibodies, Inc., 802 
F.2d 1367 (Fed. Cir. 1986), cert, denied, 480 U.S. 947 
(1987). Thus, as of 1998, when the instant patent 
application was filed, raising antibodies was a "mature" 
technology. Further, antibodies and methods of their use 
were described in detail at pages 6, 11-12 and 14-15 of the 
original specification and were claimed in originally filed 
claims 9 through 13. Thus, a written description question 
should be raised with respect to the instant claimed 
invention. See, e.g., In re Hayes Microcomputer Products, 
Inc. Patent Litigation, 982 F.2d 1527, 1534-35, 25 USPQ2d 
1241, 1246 (Fed. Cir. 1992). 

Further maintenance of this written description 
rejection is therefore improper. 
D. Conclusion 

Express disclosure in the specification of the amino 
acid sequence of a protein or antibodies thereto should not 
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be required in the instant application to meet the written 
description and enablement requirements of 35 U.S.C. 112, 
first paragraph or the utility requirements of 35 U.S.C. 
101 with respect to the instant claimed invention. 

Written description, enablement and utility are all 
determined with respect to a person of ordinary skill in 
the art. Accordingly, during the prosecution of this case, 
Appellants submitted Declarations by two different persons 
of ordinary skill in the art, specifically, Dr. Susana 
Salceda and Dr. Patrick Sluss addressing in detail how each 
understood the information disclosed in the instant 
specification to show possession of the claimed invention 
by the inventors and how each could perform experimentation 
routine as of the filing date of the instant application to 
make and use the instant claimed invention in accordance 
with the utility taught in the patent application. 

During prosecution of this case, Appellants submitted 
evidence confirming the utility of the claimed invention in 
accordance with teachings of the specification. 

Appellants also provided during prosecution of this 
case, evidence in the form of literature references and 
computer programs available prior to the filing date of the 
instant application demonstrative of the characteristics of 
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the disclosed nucleic acid sequence being enabling for the 
claimed invention . 

Finally, Appellants have identified case law and 
sections of the MPEP relevant to the instant fact situation 
supportive of express disclosure of the structure of the 
protein or antibodies thereto not being required in the 
instant fact situation. 

In contrast, it is respectfully submitted that the 
Examiner has provided no specific evidence or case law 
relevant to the instant fact situation to support the 
suggestion that the specification, which expressly 
discloses the nucleic acid sequence of SEQ ID N0:1, does 
not provide enough information to indicate for which 
proteins the claimed antibodies are specific. It is also 
respectfully submitted that the Examiner has failed to 
provide any specific evidence or case law relevant to the 
instant fact situation to support the suggestion that 
without identifying for which protein the claimed 
antibodies are specific in the specification, the 
antibodies lack utility. 

Accordingly, the evidence in the prosecution history, 
when viewed as a whole, is indicative of the instant 
application meeting the written description and enablement 
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requirements of 35 U.S.C. 112, first paragraph, and the 

utility requirements of 35 U.S.C. 101. 

Reversal of the rejections under 35 U.S.C. 101 and 35 
U.S.C. 112, first paragraph, is therefore respectfully 
requested. 

For the reasons given in this Appeal Brief, reversal 
of the Examiner's rejections is requested. 



DATE : January 21, 2010 

LI CAT A & TYRRELL P.C. 
66 E. Main Street 
Marlton, NJ 08053 
856-810-1515 

E-mail : ktyrrell@licataandtyrrell . com 
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VIII. Claims Appendix 



Claims 1-13 (canceled) 

Claim 14 (previously presented) : An isolated antibody 
or antibody fragment that binds specifically to a protein 
encoded by polynucleotide sequence SEQ ID N0:1. 

Claims 15-20 (canceled) 

Claim 21 (previously presented) : The isolated antibody 
or antibody fragment of claim 14 wherein the antibody is a 
monoclonal antibody . 

Claim 22 (previously presented) : The isolated antibody 
or antibody fragment of claim 14 wherein the antibody or 
antibody fragment is attached to a reagent selected from 
the group consisting of radioactive reagents, fluorescent 
reagents and enzymatic reagents. 

Claim 23 (previously presented) : The isolated antibody 
or antibody fragment of claim 22 wherein the enzymatic 
reagent is horseradish peroxidase or alkaline phosphatase. 

Claim 24 (previously presented) : The isolated antibody 
or antibody fragment of claim 14 wherein the antibody or 
antibody fragment specifically binds to protein in cells, 
tissues, tissue extracts or bodily fluids. 
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Claim 25 (previously presented) : The isolated antibody 
or antibody fragment of claim 24 wherein the antibody is a 
monoclonal antibody . 

Claim 26 (previously presented) : The isolated antibody 
or antibody fragment of claim 24 wherein the bodily fluids 
are selected from the group consisting of blood, urine, 
saliva and bodily secretions. 

Claim 27 (previously presented) : The isolated 
antibody or antibody fragment of claim 26 wherein blood is 
whole blood, plasma, or serum. 

Claim 28 (previously presented) : A method for binding 
an antibody or antibody fragment to a protein encoded by 
polynucleotide sequence SEQ ID NO: 1 on a cell comprising 
contacting the cell with an isolated antibody or antibody 
fragment that binds specifically to a protein encoded by 
polynucleotide sequence SEQ ID N0:1. 

Claims 29-34 (canceled) 

Claim 35 (previously presented) : The method of claim 
28 wherein the antibody is a monoclonal antibody. 

Claim 36 (previously presented) : The method of claim 
28 wherein the antibody or antibody fragment is attached to 
a reagent selected from the group consisting of radioactive 
reagents, fluorescent reagents and enzymatic reagents. 
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Claim 37 (previously presented) : The method of claim 
36 wherein the enzymatic reagent is horseradish peroxidase 
or alkaline phosphatase. 

Claim 38 (previously presented) : An isolated antibody 
or antibody fragment which binds specifically to a fragment 
of a protein encoded by polynucleotide sequence SEQ ID 
N0:1, wherein the fragment of protein encoded by 
polynucleotide sequence SEQ ID N0:1 is encoded by 
polynucleotide sequence SEQ ID NO: 12 or 13. 

Claim 39 (previously presented) : The isolated antibody 
or antibody fragment of claim 38 wherein the fragment of 
protein encoded by polynucleotide sequence SEQ ID N0:1 is 
encoded by polynucleotide sequence SEQ ID NO: 12. 

Claim 40 (previously presented) : The isolated antibody 
or antibody fragment of claim 38 wherein the fragment of 
protein encoded by polynucleotide sequence SEQ ID NO:l is 
encoded by polynucleotide sequence SEQ ID NO: 13. 

Claim 41 (previously presented) : The isolated antibody 
or antibody fragment of claim 38 wherein the antibody is a 
monoclonal antibody. 

Claim 42 (previously presented) : The isolated antibody 
or antibody fragment of claim 14 wherein the antibody or 
antibody fragment is attached to a cytotoxic agent. 

Claim 43 (previously presented) : The isolated antibody 
or antibody fragment of claim 42 wherein the cytotoxic 
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agent is selected from the group consisting of drugs, 
toxins and radionuclides. 

Claim 44 (previously presented) : A method for binding 
an antibody or antibody fragment to a protein encoded by 
polynucleotide sequence SEQ ID N0:1 on a cell comprising 
contacting the cell with an isolated antibody or antibody 
fragment that binds specifically to a fragment of protein 
encoded by polynucleotide sequence SEQ ID N0:1, wherein the 
fragment of protein encoded by polynucleotide sequence SEQ 
ID N0:1 is encoded by polynucleotide sequence SEQ ID NO: 12 
or 13 . 

Claim 45 (previously presented) : The method of claim 
44 wherein the fragment of protein encoded by 
polynucleotide sequence SEQ ID N0:1 is encoded by 
polynucleotide sequence SEQ ID NO: 12. 

Claim 46 (previously presented) : The method of claim 
44 wherein the fragment of protein encoded by 
polynucleotide sequence SEQ ID N0:1 is encoded by 
polynucleotide sequence SEQ ID NO: 13. 

Claim 47 (previously presented) : The method of claim 
44 wherein the antibody is a monoclonal antibody. 

Claim 48 (previously presented) : The method of claim 
28 wherein the isolated antibody or antibody fragment is 
attached to a cytotoxic agent. 
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Claim 49 (previously presented) : The method of claim 
48 wherein the cytotoxic agent is selected from the group 
consisting of drugs, toxins and radionuclides. 
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B7-H4 Is Highly Expressed in Ductal and Lobular 
Breast Cancer 



Barbara Tringler, 1 Shaoqiu Zhuo, 2 
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ABSTRACT 

Purpose: This study was designed to Investigate the 
expression of B7-H4 protein, a member of the B7 family that 
is involved in the regulation of antigen-specific immune 
responses, In normal breast and in primary and metastatic 
breast cardnomns. 

Experimental Design: Archival formalin-fixed tissue 
blocks from breast cancers and normal somatic tissues were 
evaluated for B7-H4 expression by immunohistochemistry 
with manual and automated image analysis. The proportion 
of B7-H4-positive cells and the intensity of B7-H4 staining 
were compared with histologic type, grade, stage, hormone 
receptor status, and HER-2lneu status. 

Results: B7-H4 was detected in 165 or 173 (95.4%) 
primary breast cancers and In 240 of 246 (97.6%) metastatic 
breast cancers. B7-H4 staining intensity was greater In 
invasive ductal carcinomas [24.61 relative units (RU)J and 
In invasive lobular carcinomas (15.23 RTJ) than in normal 
breast epithelium (4.30 RU, P = 0.0003). Increased staining 
intensity was associated with negative progesterone receptor 
status (P « 0.014) and history of neoadjuvant chemotherapy 
(P = 0.004), and the proportion of B7-H4-posltive cells was 
associated with negative progesterone receptor (P = 0.001) 
and negative HER-2/netf (P - 0.024) status. However, there 
was no statistically significant relationship between the 
proportion of B7-H4-posluVe cells or staining intensity and 
grade, stage, or other dinlcopathologic variables. Low levels 
of B7-H4 expression were also detected in epithelial cells of 
the female genital tract, lung, pancreas, and kidney, but B7- 
H4 was generally absent In most other normal somatic tissues. 

Conclusions: The nearly ubiquitous expression of B7-H4 
in breast cancer, independent of tumor grade or stage, sug- 
gests a critical role for this protein In breast cancer biology. 
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INTRODUCTION 

Numerous therapeutic modalities are available for the 
adjuvant treatment of advanced breast cancer including radio- 
therapy, conventional chemotherapy with cytotoxic antitumor 
agents, hormone therapy (aromatase inhibitors, lureinizing- 
hormone releasing-hormone analogues), bisphosphonates, and 
signal-transduction inhibitors (1). The current approach to the 
optimal treatment selection for breast cancer is multidisciplinary 
and based on several factors, including clinical stage, biological 
characteristics of the cancer, disease recurrence, patient's age and 
preferences, as well as risks and benefits associated with each 
treatment protocol, which help clinicians to stratify patients for 
appropriate treatment decisions. However, despite the great 
variety of adjuvant treatment options, many patients either 
respond poorly or not at all to any of the above-described 
therapeutic modalities. Thus, there is a need to identify new 
molecular markers for breast cancer that could provide further 
therapeutic targets for patients that are unlikely to respond 
to current treatment options. 

We initially identified and characterized DD-O110 as a 
novel gene encoding a predicted membrane glycoprotein that is 
overexpressed in breast and ovarian cancer with relatively little 
expression in normal somatic tissues, by quantitative PCR 
analysis of over 200 human tissue samples. 3 Based on the 
predicted amino acid sequence, we subsequently determined 
that DD-O110 is homologous to B7-H4 (also known as B7x or 
B7S1), a recently discovered B7 family member. B7 family 
members and their receptors play critical roles in the regulation 
of antigen-specific immune responses (2). B7-H4 ligation to its 
receptor BTLA on T lymphocytes results in inhibition of T-cell 
activation, cytokine secretion, and the development of cytotox- 
icity (3-6). B7-H4 mRNA but not protein expression has been 
detected in a wide range of normal somatic tissues, including 
liver, skeletal muscle, kidney, pancreas, and small bowel (3, 5). 
However, cell surface expression of B7-H4 protein was induced 
upon stimulation of T cells, B cells, monocytes, and dendritic 
cells in addition to a constitutive B7-H4 protein expression 
in hmg and ovarian cancer (5). The significance of B7-H4 
expression in normal or malignant nonhematopoietic eel) 
populations has not been determined. 

The present study was designed to test the hypothesis that 
B7-H4 protein is consistently overexpressed in primary and 
metastatic breast cancer and to determine if B7-H4 expression 
is dependent on histologic type, grade, stage, estrogen receptor 
(ERX progesterone receptor (PR) or HJER-2/neu status, or with 
other clinical variables. 

MATERIALS AND METHODS 

Tissue Samples. Tissues were obtained from 173 patients 
with primary breast cancer who underwent surgery at the 
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University of Colorado Hospital, Denver, CO. Tissue blocks 
were assembled from the archival collections from the 
Department of Pathology and included 155 invasive ductal 
carcinomas (152 cases of ductal carcinoma of the usual type and 
3 cases of invasive tubular carcinoma) and IS lobular 
carcinomas of the breast Cases with mixed patterns of histologic 
differentiation were excluded from the analysis. The mean age of 
patients at the time of diagnosis was 55.5 years (±12,8; range, 
29*89 years). The tumors were classified as American Joint 
Committee on Cancer pathologic stage I (90 cases), stage Ila 
(35 cases), stage lib (20 cases), stage ffla (16 cases), stage nib 
(5 cases), stage Hie (5 cases), and stage IV (2 cases). The study 
also included 246 breast cancer-positive lymph nodes from a 
subset of 27 patients who were part of the primary study 
population. We also evaluated normal breast tissue from women 
(n " 15) undergoing reduction mammoplasty but with no history 
of breast cancer. In addition* a broad spectrum of normal adult 
and fetal somatic tissues (rt => 314) was evaluated for B7-H4 
expression to confirm the specificity of the B7-H4 protein to 
breast cancer cells (Table 3). 

Information on ER, PR, and HER-2/new status was 
collected from the original surgical pathology reports. HER-2/ 
new status was determined by fluorescence in situ hybridization 
analysis (ACIS; Chromavision, San Juan Capistrano, CA). 
Patient survival data was provided by the University Hospital 
Tumor Registry for all patients. These data reported patients that 
had expired following the diagnosis of breast cancer but did not 
include information regarding disease recurrence or cause of 
death. This study was reviewed by the Colorado Multiple 
Institutional Review Board (Protocol 00-1094). 

Development and Characterization of the A57,l Anti- 
body Directed Against B7-H4. Monoclonal antibody produc- 
tion and characterization was done at diaDexus (South San 
Francisco, CA). Seven to 8 -week- old BALB/c mice were 
immunized twice weekly over a 5- to 6-week period with 10 ug 
of the recombinant B7-H4 protein, corresponding to the 
complete extracellular domain of the native proteia Lympho- 
cytes were subsequently isolated and fused with P3x63Ag8.653 
cells (7) to form a hybridoma using standard techniques. 
Hybridoma supematants were screened by EL1SA for reactivity 
against B7-H4 and for the absence of cross-reactivity with an 
unrelated recombinant protein. B7-H4-positive bybridomas 
were cloned by single-cell sorting using a Coulter EPICS 
Blite-ESP Flow Cytomcter (Beck man-Coulter, Miami, FL). The 
A57.1 monoclonal antibody was selected for use in subsequent 
studies. 

Western Blot Analysis. SKBR3, MCF-7, and RK3E 
cells were obtained from the American Type Culture Collection 
(Manassas, VA). RK3B cells were infected with a recombinant 
retrovirus expressing either B7-H4 or alkaline phosphatase 
used as a control Twenty-five micrograms of protein extracts 
were separated on a precast 4% to 12% SDS polyacrylamtde 
mini gel (Nupage; Invitrogen, Carlsbad, CA) and transferred to 
an Immobilon-P potyvinylidene difiuoride membrane (Invitro- 
gen). The membrane was blocked for 1 hour at room 
temperature using 5% nonfat dry milk and incubated overnight 
with the A57.1 antibody (1 ugAnL). The blot was developed 
using a horseradish peroxidase linked goat ami -mouse 
immunoglobulin (Jackson IrmnunoResearch Laboratories, Inc., 



West Grove, PA; 1:10,000) for 1 hour at room temperature and 
subsequently visualized using enhanced chemiluminescence 
reagent per manufacturer's directions (Amersham Biosciences, 
Piscataway, NY). 

Immunohtstochemlcal Staining* Formalin-fixed, paraf- 
fin-embedded tissue blocks were sectioned to 5 urn and 
mounted on charged glass slides (Superfrost Plus, Fisher 
Scientific, Pittsburgh, PA). Endogenous peroxidase activity was 
blocked with 3.0% hydrogen peroxide for 15 minutes. Antigen 
retrieval was done in a citrate buffer [20 mmol/L (pH 6.0)] at 
120 e C for 10 minutes. Staining was conducted on a DAKO 
autostainer (DakoCytomation, Carpinteria, CA) using an 
indirect avidin-biolin immunoperoxidase method (Vector Lab- 
oratories, Buiiingame, CA). Sections were incubated at 25 *C 
for 60 minutes with the A57.I antibody (0.8 ug/mL). Negative 
controls were run on all sections at 0,8 ug/mL of a subclass- 
matched IgGlK (BD PharMingen, San Diego, CA), generated 
against unrelated antigens. B7-H4 staining was visualized using 
33'-diaminobenziditte (DakoCytomation). Specificity of B7- 
H4 staining was confirmed by a blocking experiment with 
preincubation of the A57.1 antibody with the full-length B7-H4 
protein (7.80 ng/mL) at 25°C for 60 minutes, before 
irnmunohistochemical processing. 

Evaluation of B7-H4 Staining. The proportion of B7- 
H4-positive cells for each case was scored on a scale from 0% to 
100%. Results represent the average proportion of B7-H4- 
positive cells within the entire tumor area of a single 
representative tissue block (0-10% positive cells, >10-50% 
positive cells, >50-S0% positive cells, and >80-100% positive 
cells). The B7-H4 stained slides were digitally scanned using a 
Zeiss Axioskop 50 microscope fitted to a Syncroscan imaging 
system (Syncroscopy, Cambridge, United Kingdom). Image 
manipulation and preparation was done using Adobe Photoshop 
6.0 and image analysis of tumor and normal breast epithelium 
was done using Media Cybernetics Optimas 6,5 (Media 
Cybernetics, Silver Spring, MA). Median delta base 10 intensity 
values (derived from 256 Grayscale median pixel Luminosity) 
were corrected by subtraction of honatoxyl in-based background 
staining and recorded as relative units (RU), 

Statistical Analysis. The association of proportion of B7- 
H4-positive cases and proportion of B7-H4-positive cells with 
categorical clnricopathologic characteristics was assessed by the 
Fisher's Exact test or the test where appropriate. 
The differences between median staining intensity and clinico- 
pathologic variables were evaluated by the Wilcoxon rank sum 
test or the Kruskal-Wallis test where appropriate. Statistically 
significant univariate relationships were further evaluated 
by multivariate analysis, A log-rank test was used to test for 
differences in overall patient survival Ps £ 0.05 were considered 
statistically significant. Statistical analyses were done using SAS 
v8.1 (SAS Institute, Cary, NC). 



RESULTS 

Characterization of B7-H4 Antibody. Specificity of the 
A57.1 antibody forB7-H4 protein was confirmed by Western blot 
analysis (Fig. IX The A57.I antibody recognized a major protein 
form with a diffuse band at -60 to 80 kDa as well as several 
minor species of lower molecular weight in RK3E B7-H4 cells 
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Fig. t Western blot analysis. The A57.1 anlibody detected a major 
protein band at - 60 to 80 kDa and itveml minor bands of lower 
molecular weight in a RX3E cell line ovtrexpressing B7-H4 (RK3E B7- 
H4). A single band of similar size was found in two breast cancer cell 
lines (MCF-7, SKBR3) expressing native B7-H4 mRMA but was not 
identified in control RK3B cells (RK3E AP). 

overexpressing human B7-H4 protein, but did not detect B7-H4 
protein in negative control RK3E alkaline phosphatase cells 
expressing alkaline phosphatase. Similar protein bands were 
noted in extracts of both MCF-7 (low-level B7-H4 mRNA 
expression) and SKBR3 (high-level B7-H4 mRNA expression) 
breast cancer cells. In other experiments, we have shown that the 
size heterogeneity observed for B7-H4 proteins in tumor tissues 
and cell lines is due to variable N-linked glycosylation. 4 
Preincubation of the B7-H4.A57.1 antibody with the full-length 
recombinant B7-H4 protein completely blocked the staining in 
histologic sections. 

Primary Breast Tumors. B7-H4 expression was de- 
tected in invasive breast cancers, including 147 of 155 (94.8%) 
cases of invasive ductal carcinoma and 18 of 18 (100%) cases of 
invasive lobular carcinoma (Table 1). In almost all cases of 
invasive ductal (Fig. 2A) and lobular (Fig. 2B) carcinomas, B7- 
H4 expression was present diffusely throughout the cytoplasm 
with a pronounced membranous component However, in rare 
cases of ductal carcinoma (n *- 7), the tumor cells showed only 
localized, incomplete cytoplasmic, and membranous staining. 
There was no significant association observed between B7-H4 
status (positive cases versus negative cases) and any clinico- 
pathologic variables or overall patient survival (P = 0.910) using 
the log-rank test 

The proportion of B7-H4-positive cells in most cases of 
ductal and lobular carcinomas was >80% of the tumor cells. 
However, in a small subset of B7-H4^>ositive carcinoma cases, 
<\0% of the tumor cells were positive (Table 1). When the 
cancer cases were grouped by proportions of B7-H4~positive 
cells (Table 2), only PR and HER-2/neu were significantly 
associated by univariate analysis. A multivariate analysis found 
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that grade was an effect modifier in the relationship of PR to 
percentage of B7-H4-posiiive cells. A negative PR status was a 
significant predictor of increasing staining intensity only in 
grade 3 carcinomas. HER-2/neu was not significant at any 
grade level. 

The median B7-H4 staining intensity was greater m 
invasive ductal carcinomas (24.61 RU) and in invasive lobular 
carcinomas (15.23 RU) than in normal breast epithelium (4,30 
RU) and the differences between these three groups were 
statistically significant (Table 1; Fig. 3). Univariate analysis 
showed that increasing B7-H4 staining intensity was associated 
with a negative PR status and with a history of neoadjuvant 
chemotherapy (azidowymidine, Adriamycin, taxotcre, and 
Cytoxan). No other statistically significant associations were 
observed. Multivariate analysis found that grade was again an 
effect modifier. A significant relationship between increasing 
B7-H4 staining intensity was found only in grade 3 carcinomas 
for those with chemotherapy. Negative PR status approached 
significance in grade 3 carcinomas [P =» 0.059). 

Lymph Node Metastases. B7-H4 expression was 
detected in tumor ceils of 240 of 246 (91.6%) breast 
cancer-positive lymph nodes from 27 patients with nodal 
metastases, Within the metastatic foci, B7-H4 expression was 
cytoplasmic and predominantly circumferential membranous in 
distribution (Fig. 1C). The B7-H4 expression pattern of 
metastatic cells was always identical between individual 
lymph nodes from the same patient. Furthermore, B7-H4 
expression in tumor cells of nodal metastases was identical 
to that observed in the corresponding primary tumors. Within 
B7-H4-negative lymph nodes with metastatic carcinoma (« = 6), 
five were from the same patient. In that patient, the primary 
tumor showed B7-H4 expression in only 5% of the rumor 
cells. The other B7-H4-negative lymph node was from a 
patient whose primary tumor showed B7-H4 expression in 
only 10% of the tumor cells. Three other lymph nodes from 
that same patient showed B7-H4 expression in a very low 
proportion of metastatic tumor cells. Focal membranous and 
granular cytoplasmic B7-H4 expression was also detected in 
scattered follicular dendritic cells of hyperplastic lymphoid 
follicles of lymph nodes from patients with metastatic 
carcinoma but was never seen in lymph nodes from patients 
that were negative for carcinoma. 

Normal Somatic Tissue. Predominantly apical, luminal 
membranous B7-H4 expression was observed m ductal and 
lobular epithelial cells in 1 5 of 1 5 (100%) cases of normal breast 
tissue (Table 1; Fig. ID). In one case, however, there was 
circumferential membranous B7-H4 expression, equivalent to 
that seen in breast carcinomas. B7-H4 expression was never 
identified in myoepithelial cells or in other cellular components 
of normal breast tissue. 

• A broad spectrum of norma] adult and fetal somatic 
tissues was evaluated to test for the expression of B7-H4 in 
other cell types (Table 3). The confluent circumferential 
membranous pattern of expression, as seen in breast cancer 
cases, was never observed in normal adult somatic tissues of 
any anatomic site, However, apical membranous expression 
was noted in fallopian tubal epithelium (17 of 1 7), endometrial 
glandular epithelium (19 of 25), and occasionally in 
endometrial luminal surface epimeJium. In addition, uniform 
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Table / B7-H4 expression (no. positive cases, proportion of positive cells, and median staining intensity) m primary breast cancer and 

normal breast tissue 









No. cases (%) grouped by 
proportion of B7-H4 -positive cells 


Staining intensity 


Histological diagnosis 


No, positive 
cases (%) 


0-10% 


> 10-50% >50~80% >80-100% 


Image analysis 
/•* median RU (range) PJ 


Invasive ductal carcinomat 
Invasive lobular carcinoma 
Normal breast tissue 
*„j 


147/155 (94,8) 
18/18(100) 
15/15(100) 


26(17) 
2(11) 
1(7) 


15(10) 11(7) 103(66) 

0 3 (17) 13 (72) 

1 (7) 5(33) 8(53) 


0.132 24.61 (0-75,00) 0.0003 
15.23 (0.39-55.08) 
4.30 (1.95-13.67) 



*X J test 
tKniskal-Wallis Test 

including three cases subclass! fled as tubular carcinoma; due to rounding, percentages in parentheses may not add up to 100%. 



cytoplasmic expression without a membranous component was 
observed in endocervical glands (10 of 10). Focal membranous 
expression was detected in the bronchial epithelium of the 
lung (4 of 4), the columnar epithelium of the gallbladder (1 of 
5), the ductal and occasionally acinar epithelium of the 
pancreas (10 of 10) t the distal convoluted tubules of the 
kidney (5 of 11), and the transitional epithelium of the ureter 
(2 of 3) and the urinary bladder (4 of 4). Focal cytoplasmic 
B7-H4 expression was also noted in the pars intermedia of i/4 
sections of normal pituitary. B7-H4 cytoplasmic expression 
was farther detected in the squamous epithelium of the larynx 
(2 of 3), as well as the cortex and cuticle of hair shafts and in 
the inner zone of the outer root sheath of hair follicles (7 of 7). 
All other normal somatic tissues were consistently negative for 
B7-H4 expression. 

Within fetal tissue, B7-H4 expression was noted in the 
bronchial epithelium of the lung, the distal convoluted tubules 
and collecting ducts of the kidney, the hair follicles, the 



amniotic epithelium, and in cytotrophoblast cells of chorionic 
villi of first trimester placentas. By contrast, chorionic villi from 
term placentas were always negative for B7-H4 expression 
(Table 3). 



DISCUSSION 

Despite the use of a wide range of adjuvant treatment 
options, including radiotherapy, conventional chemotherapy with 
cytotoxic antitumor agents alone or in combination with 
endocrine therapy, bisphosphonates, and HHR-2/neu directed 
therapy (trastuzumab; rcf. 1), over 40,000 women will die from 
breast cancer in the United States in 2004 (8). Thus, new 
molecular targets most be defined as a first step leading to the 
development of novel therapeutic strategies for the treatment of 
breast cancer. 

The current study is the first to examine the expression 
of B7-H4 protein in primary and metastatic breast cancer. In 



Fig. 2 utimunohistochcmical 
detection of B7-H4 expression 
in breast cancer and normal 
breast tissue. Note strong cyto- 
plasmic and circumferential 
membranous B7-H4 expression 
in both invasive ductal (A) and 
lobular (B) breast cancers. An 
identical pattern of B7-H4 ex- 
pression is also present in 
metastatic breast cancer of an 
axillary lymph node (Cy By 
contrast, predominantly apical, 
luminal membranous B7-H4 
expression is observed in nor- 
mal breast epithelium (D). A % B t 
and Q original magnification 
X600; A original magnifica- 
tion X400. 
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Tabic 2 • Proportion of 87-H4-poiitivo cells and median staining intensity of 173 invasive breast cancer cases compared with 
clinicopaihologic variables 



No. cases (%) grouped 
by proportion ofB7- 
H4-posilivo cells 



No. cases 0-10% > 1 0-50% >50-80% >80-100% />* 



Staining intensity 



Image analysis 
median RU (range) 



Gradet 

Receptor status 
Tumor size (cm) 



No. lymph nodes with 
metastatic carcinoma 



Stage 

Age at diagnosis (y) 
Neoadjuvant chemotherapy 



1 
2 
3 

ER+ 
ER- 
PR+ 
PR- 

WaR-Vneu- 
<2 
>2-5 
>5 
0 

1-3 
>3 

' Unknown! 
I 

Ila 

lib 
life 
UIb+ 
«£50 
>50 
Ves 

No 



29 
56 
70 

137 
36 

120 
53 
25 

148 

108 
50 
IS 
97 
31 
21 
24 
90 
35 
20 
16 
12 
69 

104 
17 

156 



3 00) 
13 (23) 

10 (14) 
22(18) 

4(11) 
22(18) 

6(11) 

1(4) 
27(18) 
17(16) 
Jl (22) 

0 

15(15) 
6 09) 
2(10) 
6(25) 

14(16) 
6(17) 
6(30) 
0 

2(17) 

11 (16) 
17(16) 

1(6) 
27(17) 



5(17) 
7(13) 
3(4) 
16(12) 

1Q) 
17 (14) 
0 

2(8) 
15(10) 
15 (14) 
1(2) 
1(7) 
9(9) 
2(6) 
1(5) 
2(8) 
13 (14) 
2(6) 
0 

2(B) 
0 

9(13) 

8(8) 

0 



0 


21 (72) 


0.088 


4(7) 


32 (57) 




7(10) 


50 (71) 




10 0) 


87(64) 


0.255 


2(6) 


29 (81) 




10 (8) 


71 (59) 


0.001 


2(4) 


45 (85) 




5(20) 


17 (68) 


0.024 


7(5) 


99(67) 




9(8) 


67 (62) 


0.063 


2 0) 


36 (72) 




1(7) 


13 (87) 




8(8) 


65 (67) 


0.965 


2(6) 


21 (68) 




1(5) 


17 (81) 




5(21) 


11(46) 




5(6) 


58(64) 


0.082 


3(9) 


24(69) 




K5) 


13(65) 




0 


14(88) 




3(25) 


7(58) 




6(9) 


43 (62) 


0.542 


6(6) 


73 (70) 




2(12) 


14(82) 


0.235 


10(6) 


102 (65) 





\9$2 (0-69.92) 

16.60 (0-75,00) 
26.17 (0-74.22) 
17.58 (0-75.00) 
30.08 (0.78-74.22) 
16.41 (0-75.00) 
30.86 (0-74.22) 

24.61 (0-54.69) 
19.92 (0-75.00) 
19.92 (0-7422) 
16.60 (0-75.00) 
35.94 (8.59-63.67) 
17,58 (0-6952) 
23.44 (0-5938) 
30.08 (0.78-74.22) 
20,70 (0-66.02) 
19.92 (0-6952) 
18.36(0-75.00) 
13.04(0-5938) 
36 32 (0.78-74.22) 
16.80 (0-4S.75) 
23.44 (0-7422) 
20.90 (0-75.00) 
41.80(6.64-7422) 
19.18(0-75.00) 



0.205* 

0,0991 
00145 
0.924! 
0.120* 

0.066* 
0,194* 

0521 § 
0.004§ 



NOTE Cases with an unknown lymph node status were excluded from the statistical analysis. 
Due to rounding, percentages in parentheses may not add up to 100%. 
•Fisher's exact test 

t Ductal carcinoma including three cases subclassified as tubular carcinoma. 
JKruskaMVallis test 
§Wilcoxon rank sum test 



addition, we evaluated B7-H4 protein expression m normal 
breast tissue and in a wide range of normal adult and fetal 
somatic tissues. B7-H4 circumferential membranous and 
cytoplasmic expression was observed in >95% of invasive 
breast cancer cases and was also detected in most nodal 
metastases. Univariate analysis showed a significant corre- 
lation between the proportion of B7-H4-positive cells and 
a negative status of PR and HER-2/neu. A significant 
association was also observed between B7-H4 staining 
intensity and negative PR status, and a history of treatment 
with neoadjuvant chemotherapy but not with other clinico- 
paihologic variables or overall patient survival. The observed 
relationship between B7-H4 staining intensity level and 
negative PR status in primary breast cancer was not 
anticipated and could not be attributed to an indirect 
relationship with tumor grade. Thus, further studies are 
warranted to determine if this inverse association is due to 
other confounding clinicopatbologjc variables or could re- 
flect a mechanistic link between B7-H4 expression and PR. 
status. 

This study provided pivotal data but not definitive evidence 
to support the concept that B7-H4 could be a diagnostic marker 
or therapeutic target for breast cancer. Both HER-2/neu and B7- 
H4 are associated with the cell surface, an important consider- 
ation for potential antibody therapeutic targets (9, 10). B7-H4 



overexpression was detected in most breast carcinomas, 
including cases that were not candidates for hormonal or 
trasmzumab (Herceptin) therapy due to negative ER/PR and 
HER-2/neu status. By contrast, only weak apical cell surface 
expression of B7-H4 was seen in normal ductal and lobular 
breast epithelial cells. Focal apical membranous B7-H4 
expression was also observed m the distal convoluted tubules 
of the kidney, ductal cells, and rare acinar cells of the pancreas, 
endometrial glands, and in few other normal somatic tissues. 
Previous studies have indicated that HER-2/ne« is also 
expressed in some normal somatic tissues, including renal 
tubular epithelium, pancreatic acinar cells, and endometrial 
glands (11-14). Thus, the limited expression of B7-H4 in a 
subset of normal tissues does not necessarily rule out a potential 
role for (his protein as a therapeutic target for patients with breast 
cancer. 

Our findings that B7-H4 is not expressed in liver, small 
bowel, colon, and skeletal muscle are consistent with previous 
observations by Sica et ai (3) and Choi et al (5). In contrast 
with our current study, however, Choi et al. reported that B7- 
H4 is not expressed in the lung, gallbladder, pancreas, kidney, 
ureter, urinary bladder, pituitary, or breast The antibody used 
by Oioi et al. was used at a dilution of 1:100 (final 
concentration not specified) and was noted to be reactive only 
with tissue frozen sections (5). By contrast, the A57.1 
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Fig. J Median B7-H4 staining intensity values in primary invasive 
ductal carcinoma, invasive lobular carcinoma, and normal breast tissue. 
Horizontal bars, median staining intensity within each diagnostic 
category. 



monoclonal antibody in our study was used at a dilution of 
1:2,000 (final concentration, 0.8 fig/mL) and was reactive with 
both frozen sections and sections from archival fonaalin-fixed 
tissue blocks. Thus, the basis for the discrepancy in the 
detection of B7-H4 in some normal tissues from our study 
compared with previous observations by Choi ct al. could be 
due to differences in the sensitivity of the immunohistochem- 
teal staking protocols or to differences in the B7-H4 antibodies 
that were used. 

Although the role of B7-H4 expression in malignant 
transformation or tumor progression has not been determined, 
B7 family members and their receptors are known to regulate 
antigen-specific immune response through inhibition of T-cell 
activation, cytokine secretion, and the development of cytotox- 
icity (2-6). Extensive laboratory and histopathologic data 
indicate that T-cell immune reactivity is a favorable prognostic 
indicator in nonmctastatic breast cancer bat that the suppression 
of cell-mediated immunity could be critically involved in breast 
cancer progression (15-17). Thus, it is reasonable to hypoth- 
esize that B7-H4 overexpression could provide a mechanism for 
tumors to avoid detection by irnmune surveillance. In this lighl, 
we are currently focusing on experiments to determine if an 
antibody approach could inhibit tumor cell growth and/err reverse 
the postulated antitumor effects of B7-H4 on the immune 
system. 

In conclusion, this study showed that B7-H4 is 
consistently expressed in most primary and metastatic breast 
carcinomas. Although B7-H4 detection was associated with 
negative progesterone receptor status, negative HER-2//rew 
status, and history of neoadjuvant chemolh crapy, B7-H4 
expression was independent of tumor grade, stage, or other 
clinicopatho logic variables. The nearly ubiquitous expression 



of B7-H4 in breast carcinomas suggests that B7-H4 could be 
involved in breast cancer pathogenesis or tumor progression. 
Further studies, however, are indicated to evaluate the 
potential role of B7-H4 as a diagnostic marker or therapeutic 
target 



Table 3 B7-H4 expression (no. positive eases) In 3 14 normal 
adult and fetal somatic tissue samples 



B7-H4 positive (%)* 



Normal adult tissue 
Breast (ductal and lobular cells) 
Ovary 

Fallopian tubal epithelium 

Endometrial glands 

Myometrium 

Eodocervical glands 

Ectocervix 

Thyroid 

Parathyroid 

Adrenal gland 

Pancreas/chronic Pancreatitis 
Salivary gland 
Pituitary 
Heart 
Larynx 
Lung 



Stomach 
Duodenum 
Ileum 

' Colon/cecum 
Liver 

Gallbladder 

Kidney (distal convoluted tubules) 
Ureter 

Urinary bladder mucosa 

Testis 

Prostate 

Abdominal peritoneum 
Skin 

. Hair follicle 
Thrombus 
Skeletal musclo 
Synovial cyst 
Bone marrow 
Lymph node 
Thymus 
Spleen 

Cerebral cortex 
Cerebellum 
Spinal cord 
Eye 

Normal fetal tissue 
Amnion 
Chorion 
Placental villi 
Heart 
Lung 

Small bowel 

Kidney 

Skin 

Hair follicle 
Skeletal muscle 
Cartilage 
Adipose tissue 



16716(100) 
0722(0) 

17/17(100) 

19/25 (761 
0/25(0) 

10/10(100) 
0710(0) 
0/5(0) 
0/3(0) 
0/3(0) 

10710(100) 
0/1 (0) 
1/4 (25) 
0/5(0) 
2/3(67) 
4/4 (100) 
0/5(0) 
0/5(0) 
O/4(0) 
0/7 <Q) 
0/4(0) 
0/4(0) 
1/5 (20) 
5/11 (45) 
2/3(67) 
4/4(100) 
0/5(0) 
0/5(0) 
0/1(0) 
0/5(0) 
7/7(100) 
0/4(0) 
0/4(0) 
0/1 (0) 
0/5 (0) 
0/5 (0) 
0/4 (0) 
075(0) 
0/3(0) 
0/3 (0) 
0/5 (0) 
071 (0) 

3/8(25) 
0/8(0) 
2/6(33) 
0/1 (0) 
J/1 (100) 
0/1 (0) 
1/1 (100) 
0/3(0) 
3/3 (100) 
0/1 (0) 
0/2(0) 

on (0) 



•Cases with any detectable staining (minimal focal staining or 
greater) were scored as B7-H4 positive. 
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Abstract ] 

B7-H4 protein is expressed on the surface of a variety of immune cells and functions as a' negative regulator of T cell responses. We 
independently identified B7-H4 (DD-Ol 10) through a genomic effort to discover genes upregulated in tumors and here we describe a new 
functional role for B7-H4 protein in cancer. We show that B7-H4 mRNA and protein arc overexpressed in human serous ovarian cancers and 
breast cancers with relatively little or no expression in normal tissues. B7-H4 protein is extensively glycosylated and displayed on the surface 
of tumor cells and we provide the first demonstration of a direct role for B7-H4 in promoting malignant transformation of epithelial cells, 
Overexpression of B7-H4 in a human ovarian cancer cell line with little endogenous B7-H4 expression increased rumor formation in SCID 
mice. Whereas overexpression of B7-H4 protected epithelial cells from onoikis, siRNA-mediated knockdown of B7-H4 mRNA and protein 
expression in a breast cancer cell line increased caspase activity and apoptosis. The restricted normal tissue distribution of B7-H4, its 
overexpression in a majority of breast and ovarian cancers and functional activity in transformation validate this cell surface protein as a new 
target for therapeutic intervention. A therapeutic antibody strategy aimed at B7-H4 could offer an exciting opportunity to inhibit the growth 
and progression of human ovarian and breast cancers. 
© 2005 Elsevier Inc. All rights reserved. 
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Introduction 

Breast and ovarian cancer are the second and fourth 
leading cause, respectively, of female cancer deaths in the 
United States [I]. While the lifetime probability of devel- 
oping breast cancer and the incidence are significantly 
higher thun for ovarian cancer, the 5 year survival rate for 
breast cancer patients is notably better than for those with 
ovarian cancer [I]. Advances in understanding the funda- 
mental biology and signal transduction pathways that 
regulate normal and malignant breast cell biology have 
enabled a variety of adjuvant therapeutic strategies leading 
to improved response rates and increased survival [2,3]. 
Although significant progress has been made in diagnosis 
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and treatment of breast cancers, there is considerable 
opportunity for improvement and a need for additional 
therapeutic options [4]. Ovarian cancer is highly curable 
when discovered at] an early stage yet a majority of these 
cancers progress undetected within the peritoneum resulting 
in diagnosis at later [stages leading to a high mortality rate 
within a relatively short period of time [5,6], Advances in 
surgery and new chemo therapeutic drugs have led to modest 
improvements in initial response and survival but a majority 
of ovarian cancer patients will relapse and die from the 
disease [7]. Consequently, there is an urgent need for early 
diagnostic tools as well as new treatment modalities. 

The B7 protein family provides both stimulatory and 
inhibitory regulation of T cell responses, depending on 
which B7 ligand and receptor are engaged on the target cell 
[8,9]. B7-H4, also known as B7x or B7S1, is a recently 
discovered member of the B7 family [10-12], B7-H4 was 
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shown to be a negative regulator of T cell responses in vitro 
by inhibiting proliferation, cell-cycle progression and 
cytokine production of CD4+ and CD8+ T cells [10-12]. 
Antigen-specific T cell responses were also impaired in 
mice upon treatment with a B7-H4Ig fusion protein [12]. 
Conversely, blockade of endogenous B7-H4 in mice using a 
neutralizing antibody enhanced the generation of allogeneic 
CTL [12], Based on flow cytometry analysis, expression of 
B7-H4 protein was reported to be inducible upon stimula- 
tion of T cells, B cells, monocytes and dendritic cells 
whereas hrununohistochemistry revealed little expression in 
several peripheral tissues with the exception of positive 
staining of some ovarian and lung cancers [12,13]. Thus, 
** B7-H4 is postulated to attenuate inflammatory responses 
and perhaps serves a role in down-regulation of anti-tumor 
responses [10-13]. 

As an approach to discover and develop new diagnostic 
and therapeutic targets for breast and ovarian cancer, we used 
genomic strategies to identify genes with increased expres- 
sion in human cancer compared to normal tissues. These 
efforts yielded a gene encoding a predicted membrane 
glycoprotein of unknown annotation, designated as DD- 
01 10, whose mRNA is overexpressed in ovarian and breast 
cancers. DD-Ol 10 protein is heavily glycosylated, displayed 
on the surface of rumor cells and overexpressed on a majority 
of serous ovarian carcinomas and breast cancers with little or 
no normal tissue expression. Based on database homology 
searches, we determined that DD-Ol 10 is the same as the 
recently discovered immunomodulatory protein, B7-H4. 
Whereas B7-H4 has been implicated in regulation of immune 
function, our data add a new dimension to the functional role 
of B7-H4 in cancer by demonstrating that this protein can 
promote transformation and tumor formation when over- 
expressed in tumor epithelial cells. Together, our findings 
support a therapeutic antibody strategy targeting B7-H4 for 
cancer treatment 



Materials and methods 

Discovery of DD-Ol 10 (B7-H4), a gene upivgulated in 
ovarian cancer 

Proprietary bbinforraalic algorithms were used to mine 
the cDNA database LifeSeq w (Incyte Corporation, Wil- 
mington, DE) for ESTs that were present preferentially in 
human ovarian cancer cDNA libraries, with low abundance in 
libraries from any other tissue type and disease state including 
normal ovary. Using these criteria, DD-Ol 10 was discovered 
and full length clones were obtained from ovarian cancer 
cDNA using standard molecular biology methods. Compar- 
ison of the DD-Ol 10 nucleotide sequence and predicted 
protein with public databases initially revealed identity to a 
novel protein (Genbank: gi 10438801), predicted to be 
transmembrane, called FLJ22418. Based on routine data- 
base searches, we subsequently determined that DD-Ol 10 



was identical to the recently published B7-H4 (B7x, B7S1) 
[10-12]. ; 

Real-time quantitative RT-PCR (QPCR) 

Tissue samples were purchased from various commercial 
sources including Zoion Diagnostics (Hawthorne NY) 
Kaplan Comprehensive Cancer Center (New York NY) and 
National Disease Research Exchange (Philadelphia PA). 
Total RNA from gross tissue was prepared using Trizol® 
RNA isolation reagent (Life Technologies, Grand Island, 
NY) and treated with RNase-free Deoxyribonuclcase I (Life 
Technologies). For cDNA synthesis, 9 mg of RNA was added 
to plus-RT reaction buffer containing 50 inM KC1, 10 mM 
Tris-HCl pH 8.3, 5^5 mM MgCl 2 , 200 mM of each 
deoxynucleotide triphosphate, 2.5 mM random hexamers, 
0.4 U/ml RNase inhibitor, 1.3 U/ral MuLV reverse tran- 
scriptase and DEPC treated water (Ambion, Austin, TX) in a 
final volume of 500 ml. Reverse transcription reagents were 
obtained from Perkin^EImer (Foster City, CA). A scaled 
negative control reverse transcription was set-up using 1 mg 
of RNA with the same reagents except without MuLV reverse 
transcriptase. Reverse transcription was performed in a 
Gene Amp., 9700 cycler (PE Applied Biosys terns, Foster 
City, CA) with a 10 min incubation at 25°C, 1 cycle of reverse 
transcription at 48" C for 30 min and enzyme inactivation at 
95 °C for 5 min. Following reverse transcription, the RNA/ 
cDNA mixture from the plus-RT and minus-RT reaction 
volumes was brought down to 1 ng/ml with Tris-EDTA 
buffer pH 7.0 (BioWhittaker, Walkersville, MD). 

QPCR was carried out on an ABI Prism®7700 Sequence 
detection system (PE Applied Biosystems, Foster City, CA) 
in Micro Amp® optical 96- well plates, using the TaqMan® 
Universal PCR master mix according to the manufacturers 
directions. The amplification used 10 ng of template with 
primers and probes designed for B7-H4 (forward primer 
5' C ACCAGG ATAACATCTCTCAGTG A A3' , reverse pri- 
mer 5'TGGCTTGCAGGGTAGAATGA3', and probe 
5'AAGCTGAAGATAATCCCATCAGGCAT3'); and the 
endogenous internal control ATP synthase 6 (forward primer 
5'CAGTGATTATAGGCTTTCGCTCTAA3 ; , reverse primer 
5'CAGGGCTATTGGTTGAATGAGTA3', and probe 
5'AGCCCACTTCTTACCACAAGGCACA3')- Primers 
and probes were synthesized by Megabases (Evanston, IL) 
and OligoFactory-PE Biosystems (Foster City, CA). Expres- 
sion levels are represented relative to one sample named 
calibrator that becomes the 1 x sample, and mRNA levels in 
all other samples are expressed as an «-fold difference 
relative to the calibrator (ABI Prism 7700 Sequence 
Detection System User Bulletin #2). 

Cell lines 

All cell lines were - purchased from the American Type 
Culture Collection (Manassas, VA) and grown according to 
supplied specifications. 
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siRNA oligonucleotides 

A siRNA was designed based on the open reading 
frame of the B7-H4 mRNA using methods previously 
described [14,15], A random "scrambled" siRNA was 
used as a negative control. We also used as an additional 
negative control a siRNA targeting Emerin [16] and as a 
positive control .for knockdown of an mRNA inducing 
apoptosis, a siRNA targeting DAXX [17]. A BLAST 
search against the human genome was performed with 
each siRNA sequence to ensure that the siRNA was 
target-specific. All siRNA molecules (HPP purified grade) 
were chemically synthesized by Xcragon Inc. (German- 
town, MD). siRNAs were dissolved in sterile buffer, 
heated at 90°C for 1 min and incubated at 37°C for 1 h 
prior to use. siRNA oligonucleotides were: 

B7-H4: sense 5'-GGUGUUUUAGGCUUGGUC- 
C(dTdT)-3' 

Emcrin: sense 5'-CCGUGCUCCUGGGGCUGG- 
G(dTdf>3' 

Scrambled: sense 5'-UUCUCCGAACGUGUCAC- 
GU(dTdT>3' 

DAXX: sense 5'-GGAGUUGGAUCUCUCA- 
GAA(dTdT)-3' 

Transfection with siRNA oligonucleotides 

6 x 10 4 SKBR3 cells were seeded in 12-well plates 
for 18-24 h prior to transfection, A final concentration of 
100 nM siRNA (except 200 nM DAXX siRNA) and 1.5 
ul Oligofectamine reagent (Invitrogen) were used per well 
of cells for transfection according to the manufacturer's 
protocol siRNAs were trans fected in (riplicate for all 
experiments. Parallel wells of cells were evaluated 72 h 
after transfection for raRNA levels by QPCR, protein 
levels by Western blot and apoptosis by two different 
assay systems (see below). All findings were confirmed 
in at least 3 independent experiments. A QuantiTech 
SYBR Green RT-PCR kit (Qiagen Inc.) was used for 
QPCR evaluation of mRNA knockdown. Between 20 and 
40 ng of template, RNA was used per reaction. QPCR 
was performed using an ABI Prism® 7700 Sequence 
detection system as above. 

Apoptosis assays 

Two different assay kits were used to evaluate 
apoptosis. With the "Apo-ONE Homogeneous Caspase- 
3/7 Assay" kit (Promega Inc., Madison, WI), cells were 
sohibilized directly on the culture plate and caspase 
activity, reflected by a fluorescent readout, was measured 
according to supplier's instructions. With the "Guava 
Nexin V-PE Kif 1 (Guava Technologies Inc.), cells were 
harvested by trypsinization, washed and approximately 
10 s cells were resuspended in 40 \i\ provided buffer and 



5 ul each Annexin V (+) and 7-AAD (-) added. 
Following 20 min incubation on ice, cells were analyzed 
using the Guava . PC A Flowcytometer according to 
manufacturer's instructions. For the anoikis assay, RK3E 
cells were trypsinized and resuspended in FBS-free media 
at 200,000 cells/ml, 1 ml aliquots were plated per well of 
a 12-well plate coated with poly-HEMA (Sigma- Aldrich) 
and incubated at 37°C for 24 h, Cells were collected and 
evaluated using the, Guava-Nexin V-PE kit. 

SDS-PAGE and Western blot analysis 

72 h after transfection with siRNA, cell extracts were 
prepared on ice using solubilization buffer (1% NP40, 10 
mM Na 2 PC>4, 0.15! M NaCI) plus a complete protease 
inhibitor cocktail (Roche Inc.). Extracts from virus- 
infected or untransfected cells were similarly prepared. 
Pathology-verified, snap-frozen, minced tumor tissue from 
serous ovarian adenocarcinoma, normal ovary, breast 
ductal adenocarcinoma and corresponding normal adja- 
cent tissue (Ardais Corporation Lexington MA) or normal 
adult tissues (ovary, spleen, bladder, kidney, liver, heart 
from Zoion Diagnostics) were homogenized in extraction 
buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCI, 1 mM 
EDTA, 1.0% NP40;and 0.25% DOC 0.5% plus complete 
protease inhibitors) followed by sonication and centrifu- 
gation to clarify the, extracts. Protein extracts from human 
fetal normal tissues were purchased from Biochain Inc. 
(Hayward CA). Between 20 and 50 ug of protein extract 
was used per gel lane; protein equivalent concentrations 
were evaluated for protein level comparisons on the same 
gel. Pre-cast 4-12% SDS-polyacrylamide minigels with 
MES running buffer (Nupage; Invitrogen) were used. 
Gels were transferred to Iramobilon-P PVDF membranes 
(0.45 nm pore size,; Invitrogen) using lx Nupage transfer 
buffer plus 10% Methanol. Membranes were blocked 
using 5% nonfat dry milk in PBS with 0.05% Tween-20 
(PBSMT) and incubated with primary antibody overnight 
in PBSMT. A mouse monoclonal antibody directed 
against B7-H4, A57.1, was produced in-house using 
recombinant B7-H4; protein [18] and was used at a final 
concentration of 1 ug/ml. A mouse monoclonal antibody 
against GAPDH (Chemicon Inc.) was used at a final 
concentration of 2- ug/ml. Following primary antibody 
incubation, membranes were washed in PBS with 0,05% 
Tween-20 (PBST) and incubated with horseradish perox- 
idase linked goat anti-mouse immunoglobulin (Jackson 
Lab Inc.) at a 1:10,000 dilution in PBSMT. Membranes 
were washed with; PBST followed by detection using 
enhanced che mi luminescence (ECL) reagent per manu- 
facturer's directions; (Amersham), 

Enzymatic deglycosylation 

Deglycosylation experiments were performed on pro- 
tein extracts from MCF7 cells and two human serous 
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ovarian tumor tissues using Peptide W-Glycosidase F 
(New England Biolabs, Inc, Beverly, MA) as per the 
manufacturer's directions. Treated samples along with 
untreated control samples handled in parallel were then 
analyzed by Western blot. 

Cell surface biotlnylatton 

Cell monolayers were washed with cold PBS and 
incubated on ice for 30 m in with a final concentration of 
0,5 ug/ml Sulfo-NHS-SS-Biotin (Pierce) in PBS. Cells 
were washed several times with PBS plus 25 raM Tris 
and then with PBS followed by extraction with solubi- 
lization buffer. Clarified supernatants were hnmunopreci- 
pitated with strep tavidin agarose (Pierce) followed by 
Western blot analysis. 

Immunofluorescence 

Cells, seeded on 18 x 18 mm glass coverslips, were fixed 
with 3.7% formaldehyde in PBS without permeabilization 
and then incubated for 30 min room temperature with the 
A57.1 antibody at a final concentration of 10 ug/ml . Cells 
were next washed and incubated with a secondary Cy3- 
labeled donkey anti-mouse (Jackson Immunoresearch Labo- 
ratories, West Grove, PA) at a concentration of 10 ug/ml for 
30 rnin. After washing, cells were mounted in a medium 
containing DAP1 (Vectastain, Vector, Burlingame, CA) and 
observed using a Zeiss fluorescence microscope Axioskop2 
equipped with appropriate fluorescent filters. Micrographs 
were obtained with an Axiocam camera. 

Immunohistochemistry 

Human breast and ovarian rumors and corresponding 
normal tissues were obtained from National Disease 
Research Exchange (Philadelphia, PA). The human tissues 
or tumor xenograft samples were fixed in 10% neutral 
buffered formalin for 24 h then embedded in paraffin. Six- 
micrometer-thick sections were baked at 50°C, deparaffi- 
nized in Histo-clear (HS-200, National Diagnostics, Atlanta, 
OA) and rehydrated through decreasing ethanol concentra- 
tions into PBS. Antigen unmasking was performed by Heat 
Induced Epitope Retrieval (HIER) in a decloaking chamber 
(Biocare, Walnut Creek, CA) for 10 nun in 20 mM sodium 
citrate buffer (pH 6.0) at 120"C, 15-17 PSI. Endogenous 
peroxidase activity was quenched with 3% hydrogen 
peroxide solution for 15 min. Slides were stained with a 
final concentration of 1 ug/ml A57. 1 antibody using Power- 
Vision™ IHC Homo Kit (ImmunoVision Technologies Co., 
Brisbane, CA) as directed. Staining was visualized by 
incubation with 3,3'-diammobenzidine chromagen for 2-5 
min and counterstaining with hematoxylin followed by 
dehydration and mounting in permount medium (Micro 
Mount 2000™ , American Master*Tech Scientific, Inc. Lodi, 
CA). 



Expression vector construction 

B7-H4 cDNA was sub-cloned into the pLXSN 
retrovirus vector (BD Bioscience/Clontech) and sequence 
verified. The MLV LTR promotes expression of B7-H4 
cDNA and an SV40 promoter drives expression of a Neo 
gene for G418 resistance. pLAPSN, a retroviral vector 
encoding alkaline phosphatase (AP), was purchased from 
BD Bioscience/Clontech (pLXSN-AP). 

Virus ptvduction 

Ecotropic virus was used to infect RK3E cells and 
amphotropic virus for SKOV3 cells. The pVpack-Eco 
plasmid (Stratagene) and pVpack-Ampho plasmid (Strata- 
gene) were used for ecotropic and amphotropic virus pack- 
aging, respectively. 293T cells were seeded at 8 x 10 cells 
per well of a 6 well dish onto Biocoat collagen coated plates 
(BD). Twenty-four hours later, cells were transfected with 
plasmids using Lipofectamine with PLUS reagent (Invitro- 
gen) according to the manufacturer's recommendations. 
Retroviral vectors, pLXSN-B7-H4 or pLXSN-AP plus 
pVpack-Eco/Ampho and pVpackGP (Stratagene), were 
transfected for 3 h after which the cells were grown overnight 
in DMEM containing 20% FBS. The medium was changed to 
DMEM with 10% FBS and virus-containing media were 
harvested 24 h later and filtered through a 0.45 jim 
polysulfonic filler. 

Virus infection and selection 

A final concentration of 4 ug/ml polybrene (Hexadimethr- 
ine Bromide; Sigma, St Louis, MO) was added to fxesh virus- 
containing medium. Cells plated the day before at a density of 
3 x 1 0 5 cells per 100 mm 2 dish were washed with phosphate- 
buffered saline including Ca2 + and Mg 2+ (Cellgro). Virus 
containing medium was applied to the cells and incubated for 
3 h at 37°C. The medium was replaced by fresh growth 
medium and the cells incubated at 37°C for 48-72 h at which 
point a final concentration of 350 ug/ml of G418 sulfate 
(Cellgro) was added. Following G418 selection, pools of 
cells were used for all subsequent experiments. Expression of 
ectopic proteins in the virus-infected, selected cells was 
verified by Western blot and expression of AP was monitored 
by staining. 

Tumor xenograft experiments 

G4l8-selected pools of SKOV3 cells infected with 
either a retrovirus \ expressing B7-H4 or a control 
retrovirus were injected subcutaneously into SCID/Beige 
mice (Charles River Laboratories). Ten mice were used 
per group and 10 7 cells in 100 ui PBS were implanted 
with matrigel. 100% of the mice injected with tumor cells 
developed tumors and tumor formation was monitored by 
palpation and caliper! measurement. Tumor volume was 
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calculated using the formula: (length x width 2 ) / 2. Data 
are expressed as mean group tumor volume over time. 
All animal experiments were performed in complete 
compliance with institutional guidelines. 

Statistical analysis of tumor xenograft data 

A single factor ANOVA was performed to test whether 
on the last day of measurement the tumor volumes 
between control and B7-H4 groups differed. The results 
indicated a >99.0% probability that the two groups do 
not have the same tumor volume. Furthermore, pairwise 
two-sample t tests assuming unequal variances with 
Bonferroni correction analysis were performed comparing 
the SKOV3-B7-H4 tumors to the SKOV3-control tumors. 
Analysis of data from the last day of measurement 
revealed that the SK.OV3-B7-H4 tumors had significantly 
larger volumes than SKOV3 -control tumors at a 99.0% 
confidence level. 



Results 

Discovery of DD-OJ10 (B7-H4), a gene upregulated in 
cancer 

To identity genes that are upregulated in ovarian 
cancer with restricted normal tissue expression we used 
bioinformatic algorithms to search for Expressed 
Sequence Tags (ESTs) that were present preferentially 
in ovarian cancer cDNA libraries, with low abundance in 
cDNA libraries from any other disease state or tissue 
type including normal ovary. Based on these criteria, one 
sequence identified through this comprehensive mining 
approach was named DD-OI10. DD-O110 was also 
identified in a breast cancer cDNA subtraction experi- 
ment. Cloning and sequencing of a DD-O110 cDNA 
revealed that it encoded a predicted novel type I 
membrane glycoprotein with two immunoglobulin 
domains (Fig. la). Upon completion of our character- 
ization and validation studies, we determined that DD- 
01 10 is the same as the recently discovered negative 
regulatory member of the B7 family, B7-H4 (B7x, B7S1) 
[10-12]. We will henceforth refer to human DD-O110 as 
B7-H4 for consistency with the published literature. 

B7-H4 mRNA is over-expressed in breast and ovarian 
cancers with minimal normal tissue expression 

B7-H4 mRNA expression was evaluated using QPCR 
willi a panel of 190 mRNA samples representing a 
variety of human cancer and normal tissues. 100% (15/ 
15) of ductal breast adenocarcinomas and 100% (4/4) of 
lobular breast adenocarcinomas showed more than 2-fold 
overexpression of B7-H4 mRNA compared to a pool of 9 
normal breast samples (Fig. lb). Evaluation of 13 ovarian 



cancer samples of various subtypes and 13 normal 
ovarian tissues revealed that 88% (7/8) of the papillary 
serous adenocarcinomas expressed B7-H4 mRNA at least 
2-fold higher than the average expression calculated from 
the 13 normal ovarian samples (Fig. 1c). The remaining 
ovarian cancer subtypes did not show elevated B7-H4 
mRNA expression (Fig. lc). Thirteen additional cancer 
types matched with; normal adjacent tissue did not show 
significant expression of B7-H4 mRNA with the excep- 
tion of uterine endometrial cancer where B7-H4 was 
overexpressed in 64% of the tumor samples (data not 
shown). Minimal expression of B7-H4 mRNA was 
detected in 12 normal tissue types including breast, 
colon, endometrium, kidney, liver, pancreas, prostate, 
small intestine, spleen, stomach, testis and uterus 
(Fig, lc). In a separate experiment, these results were 
confirmed along with 11 additional normal tissues 
(adrenal, brain, cervix, esophagus, heart, lung, skeletal 
muscle, placenta, rectum, thymus and trachea) which also 
showed little or no B7-H4 mRNA expression (data not 
shown). Together, these data show that B7-H4 mRNA is 
overexpressed in breast and serous ovarian cancers with 
low or no expression in a variety of normal tissues or 
other cancer types. 

Detection ofB7-H4 ptvtein In tumor cell lines and In human 
ovarian and breast tumor tissues 

A mouse monoclonal antibody, A57.1, was generated 
that specifically recognizes B7-H4 protein using a variety of 
detection methods. Protein extracts from a set of cancer cell 
lines with and without B7-H4 mRNA expression (data not 
shown) were evaluated by Western blot with the A57.1 
antibody. A strong,; diffuse band in the 50-80 kDa size 
range was detected in the B7-H4 mRNA positive but not 
negative cell lines (Fig. 2A). The size of the major B7-H4 
protein form varied between cell lines and a protein of 
approximately 28 kpa was also detected in some cells (Fig. 
2A). Western blot analysis of three serous ovarian cancers 
compared to histologically normal ovarian tissue showed a 
diffuse B7-H4 protein band in the 50-80 kDa range in 2/3 
cancers and 0/0 of the normal samples (Fig. 2B). Similarly, 
Western blot evaluation of three breast cancers compared to 
histologically normal adjacent tissue showed several prom- 
inent B7-H4 protein species in the 40-80 kDa range in 2/3 
tumor samples (Fig; 2B). Faint B7-H4 protein bands were 
also detected in the third breast tumor sample as well as in 
the three normal adjacent tissue samples (Fig. 2B). No B7- 
H4 protein was detected in protein extracts of normal adult 
ovary, spleen, bladder, kidney, liver and heart (Fig. 2C). The 
detection of B7-H4. protein in human breast and ovarian 
cancers but not most normal adult tissues by Western blot is 
in good agreement, with the mRNA expression data. A 
survey of human fetal tissues by Western blot detected B7- 
H4 protein in kidney and placenta but not in other tissues 
tested (Fig. 2D). 
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Kg. I. (a) Schematic representation of (he predicted DD-Ol 10 (B7-H4) protein, (b) QPCR analysis of B7-H4 ruRNA expression in ctoctal and lobular breast 
cancer tissues compared to pooled normal breast tissue "N." The graph shows relative mRNA expression levels in the indicated samples. (C) QPCR analysis of 
B7-H4 mRNA expression in ovarian cancer, normal ovary and other normal tissues. Serous ovarian cancer "1-8" mucinous and low malignant potential 
ovarian cancers "A~E?\ normal ovary "1-13" and pooled samples from normal breast "B", colon "C, endometrium "E", kidney "K*\ lung "L" pancreas "Pa", 
prostate "Pr". small intestine "ST, spleen "Sp", stomach "Sr, testis *T* and uterus "U". 
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The expression of B7-H4 was evaluated further by 
imtnuiiohistochemical analysis of human tissues with the 
A57.1 antibody. Intense circumferential membrane and 
cytoplasmic staining of a majority of the tumor epithelium 
was observed in sections of ovarian serous adenocarcimona 
and breast ductal adenocarcinoma (Fig. 2E, panels c,d). No 
B7-H4 staining was seen with norma! ovarian tissue (Fig. 
2E, panel a) whereas less intense staining restricted to the 
apical cytoplasmic membrane of ductal and lobular cells 
was observed in normal breast tissue (Fig. 2E, panel b). No 
staining of ovarian or breast rumor tissues was detected 
using an isotype matched control antibody (Fig. 2E, panels 



e, f)< These immunohistochemical results are consistent with 
the mRNA and Western blot data described above and taken 
together the data show that B7-H4 mRNA and protein are 
over-expressed in human ovarian and breast tumor tissue 
compared to corresponding normal tissue. 

B7-H4 is glycosylated 

The protein backbone of B7-H4 is predicted to be 28.8 
kDa after signal peptide cleavage, significantly smaller than 
the B7-H4 protein sizes observed by Western blot The 
predicted B7-H4 protein sequence contains seven potential 
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Fig. 2. Detection of B7-H4 protein in human tumor cell lines and tumor tissues by Western blot and itnmunohistocbemistry. Protein extracts of rmr oa" tumor 
cell lines and tissues were evaluated by SDS-PAGE followed by Western blot using the A57.1 antibody against B7-H4 and an antibody against GAPDH to 
verify the integrity of the tissue samples. (A) B7-H4 mRNA positive ceil lines (MCF7, T47D, BT474, MDA-MB453, SKBRJ) and B7-H4 mRNA negative cell 
lines (CaOV3, HeLa, Heel a, A549, NCIH522, SW480, CaCo2, LoVo, HT29). (B) Serous ovarian adenocaroinoims and breast ductal adenocarcinomas with 
histologically normal ovarian tissue (Normal) or histologically normal adjacent breast tissue from the same patients (NAT) alongside SKBR3, HeLa and HT29 
cells. (C) Adult normal tissues alongsido MCF7 and CaOV3 cells. (D) Fetal normal tissues alongside MCF7 and CaOV3 cells. (E) Immunohistochemical 
staining of normal ovary (a), norma! breast (b), ovarian serous adenocarcjnjooa (c, c) and breast ductal adenocarcinoma (d, 0 with the A57.1 antibody (a, b, c, 
d) or an isotype matched control antibody (c, 0- The arrows indicate a normal ovarian follicle (a) and normal breast epithelium (b). 
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N-linked glycosylation sites which could account for the 
increased size and diffuse appearance of the protein. To test 
this possibility, protein extracts from MCF7 cells and two 
serous ovarian cancer tissues expressing B7-H4 were treated 
with PNGaseF, on enzyme that removes N-linked carbohy- 
drates, followed by Western Wot analysis with the A57.1 
antibody. Treatment with PNGaseF reduced the B7-H4 
proteins to a distinct band of approximately 28 kDa, the 
predicted size of the B7-H4 protein without post-transta- 
tional modification (Fig. 3). These data indicate that B7-H4 
is //-glycosylated and, based on the heterogeneous species 
of B7-H4 detected in different cells and tissues, suggest that 
the glycosylation varies between different tumors. 

Identification of B7-H4 protein on the surface of breast 
tumor cell lines 

Since B7-H4 is predicted to be a transmembrane protein, 
we examined whether B7-H4 could be detected on the cell 
surface. Live SKBR3 (B7-H4 protein positive) and HT29 



(B7-H4 protein negative) cells were biotinylated to label the 
cell surface proteins^ The cells were solubilized and 
biotinylated proteins were precipitated from the mixture 
with avid in agarose followed by Western blot analysis. B7- 
H4 protein was detected in the biotinylated fraction of 
SKBR3 but not HT29 cells thus demonstrating its celt 
surface localization (Fig. 4a, top panel). NaK-ATPase, a 
known cell surface protein, was used as a positive control 
and was readily detected in the biotinylated fraction of both 
SKBR3 and HT29 cells (Fig. 4a, middle panel). To ensure 
that internal cell proteins were not biotinylated, GAPDH, an 
abundant cytoplasmic protein, was evaluated as a negative 
control. As expected, GAPDH protein was not detected in 
the biotinylated fractions whereas it was readily detected in 
total cell lysates (Fig. 4a, bottom panel). Similar results were 
obtained upon biotinylation of MCF7 cells which also 
express B7-H4 protein (data not shown). 

We next performed' immunofluorescence to visualize the 
subcellular localization of B7-H4 in SKBR3 cells. Fixed, 
unpenneabilized SKBR3 cells as well as the B7-H4 
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Fig. 3. D7-H4 is glycosylated. Protein extracts of MCF7 cells and two 
different serous ovarian cancer tissue samples wore treated either with (+) or 
without (-) PNOascF to remove N-lmked carbohydrate. Samples were 
evaluated by SDS-PAGE followed by Western btot with the A57.1 
antibody. 

negative HT29 cells were stained with the A57.1 antibody. 
Specific plasma membrane labeling was observed with 
SKBR3 cells while the HT29 cells were negative (Fig. 4b), 
In agreement with these data, flow cytometry analysis using 
several different monoclonal antibodies against B7-H4, 
detected the protein on the surface of live SKBR3 cells 
(data not shown). 

Knockdown of B7-H4 leads to increased tumor cell 
apoptosis 

To evaluate the functional role of B7-H4 expression in 
tumor cells, we tested whether siRNA mediated knockdown 
of B7-H4 in the SKBR3 breast cancer cell line would lead to 
apoptosis. A B7-H4-specific siRNA diminished the level of 
B7-H4 mRNA in SKBR3 cells by approximately 65% at 72 
h after transfection (Fig. 5a). A siRNA consisting of a 
scrambled sequence with no homology to any mRNAs 
based on blast search was used as a negative control and had 
no effect on B7-H4 mRNA levels (Fig. 5a). Furthermore, 
neither the B7-H4-specific siRNA nor the scrambled siRNA 
induced any non-specific knockdown of GAPDH mRNA 
(data not shown). Western blot analysis confirmed that the 
B7-H4 siRNA treated SKBR3 cells exhibited a correspond- 
ing decrease in B7-H4 protein level with no effect on 
GAPDH protein (Fig. 5b). The effect of the siRNAs on 
apoptosis was evaluated using cells treated in parallel. The 
B7-H4-specific siRNA led to a significant increase in 
apoptosis measured with an Annex in V assay whereas the 
scrambled control siRNA had no effect (Fig, 5c), A siRNA 
specific for the anti-apoptolic DAXX protein was used as a 
positive control for apoptosis induction since siRNA- 
induced knockdown of DAXX was shown previously to 
result in apoptosis of cultured tumor cells [17]. In SKBR3 
cells, die DAXX siRNA led to approximately 65% knock- 
down of its corresponding mRNA (data not shown) in 
conjunction with increased apoptosis (Fig. 5c), similar to 
published findings. The extent of B7-H4 siRNA induced 
apoptosis in SKBR3 cells was similar to that obtained with 
DAXX siRNA (Fig. 5c). As an additional control to exclude 



any non-specific apoptotic effects of the B7-H4 siRNA, a 
similar experiment was performed using the HT29 tumor 
cell line that does not express any B7-H4 mRNA or protein. 
Treatment of the HT29 cells with the B7-H4 siRNA did not 
induce apoptosis whereas the DAXX siRNA did (data not 
shown). 

Increased caspase activity is frequently observed during 
apoptosis and serves as a sensitive indicator of induced cell 
death [19, 20]. The activity of caspase 3 and 7 was 
measured in SKBR3 cells treated with either scrambled, 
B7-H4- or DAXX-specific siRNA. Knockdown of B7-H4 
mRNA led to a significant increase in caspase activity 
indicating the induction of apoptosis and thus a functional 
role for this protein' in the survival of the cells (Fig. 5d). 
Knockdown of DAXX mRNA also induced caspase activity 
as expected (Fig. 5d) [17]. QPCR performed in parallel 
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Fig. 4. Detection of B7-H4 on the cell surface, (a) Monolayers of live 
SKBR3 (B7-H4 positive) or HT29 (B7-H4 negative) cells were surface 
biotmylated (+) or mock treated (— ) followed by solubilization and 
precipitation with avidin- agarose. Avidin-offinity purified proteins we 
evaluated by SDS-PAGE followed by Western blot with antibodies against 
B7-H4 (top)* NaK-ATPase (middle) or GAPDH (bottom). Protein 
equivalent samples of total cell tysate were evaluated in parallel, (b) 
Monolayers of SKBR3 (B7-H4 positive) or HT29 cells (B7-H4 negative) 
were used for immunofluorescence with the A57.1 antibody. 
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Fig. 5. Knockdown of B7-H4 mRNA and protein expression in SKBR3 celts leads to increased apoptosis. (a) SKBR3 cells were untreated (No siRNA) or 
treated with either scrambled- or B7-H4*pecific siRNA and B7-H4 mRNA levels were evaluated by QPCR- (b) B7-H4 and GAPDH protein levels in 
scrambled- (Sc) or B7-H4-siRNA treated cells were evaluated by SDS-PAGE followed by Western blot with an antibody against each protein, (b) Apoptosis 
induction, measured by an Annexin V assay and Guava detection system, was evaluated in parallel siRNA treated cells including cells treated with siRNA 
specific for DAXX (d) A different knockdown experiment was performed using scrambled, DAXX-, Emerin- or B7-H4-specific siRNAs and caspase 3/7 
activity was measured, (e) B7-H4, Emerin and DAXX mRNA levels from the experiment in panel (d) were evaluated by QPCR using target-speciCc primers. A 
scrambled control was evaluated in parallel for each primer pair. 



demonstrated that the B7-H4 and DAXX-specific siRNAs 
were able to reduce their corresponding mRNA levels by 
approximately 60% and 65%, respectively (Fig. 5e), In 
another study, knockdown of the nuclear membrane protein 
emerin with a specific siRNA in mammal ian cells indicated 
that this protein is non-essential for survival [16]. SKBR3 
cells were treated with siRNA specific for emerin as an 
additional specificity control for the effects of B7-H4 
siRNA on apoptosis. While emerin siRNA led to a 50% 
decrease in emerin mRNA levels, there was no effect on 
caspase activity (Figs. 5d> e). 

Overexpression ofB7-H4 protects cells from apoptosis 

Since knockdown of B7-H4 in tumor cells led to an 
increase in apoptosis, we tested conversely whether 
overexpression of B7-H4 could protect cells from 



apoptosis. A rat epithelial cell line, RK3E, was used 
since this model system has been employed successfully 
to demonstrate the effects of a variety of genes and 
signaling pathways 'important for epithelial cancers, 
including p-catenin, Ras and c-Myc, on apoptosis and 
other parameters of transformation [21-23], B7-H4 was 
eccopically expressed in RK3E cells and G418-selected, 
polyclonal pools of retrovirus-infected cells were used for 
subsequent experiments to avoid clonal artifacts. Ectopic 
expression of alkaline phosphatase (AP) was used as a 
negative control. Expression of B7-H4 protein in the 
G418-selected cell pools was verified by Western blot 
(Fig. 6a). No B7-H4 protein was detected in the AP- 
expressing RK3E cells (Fig. 6a). Infection and selection 
efficiency was evaluated by staining AP-infected cell 
monolayers for AP activity. Essentially, all of the selected 
cells were positive (data not shown) and, consequently, 
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Fig. 6. Overexpression of B7-H4 protects cells from apoptosis. RK3E cells 
were infected with either a retrovirus expressing B7-H4 or an AP-control 
retrovirus followed by G418-selection. (a) Expression of B7-H4 protein in 
the selected cells was verified by SDS-PAGE followed by Western blot, (b) 
RK3B cells expressing B7-H4 or AP were evaluated using an Annexin V 
assay and Guava detection system to measure apoptosis in response to loss 
of substrate attachment after 24 h (anoikis). 

most of the G418-selected cells expressed the gene of 
interest AP-control and B7-H4 expressing RK3E cells 
were plated in suspension and the percent of cells 
undergoing apoptosis in response to loss of matrix 
adhesion (anoikis) was measured 24 h later. The cells 
expressing B7-H4 showed a 40% decrease in apoptosis 
compared to the AP-control cells (Fig. 6b) suggesting 



that B7-H4 overexpression can protect cells from 
apoptosis. 

Overexpression ofB7-H4 protein promotes xenograft tumor 
formation in SC1D mice 

To examine further the transforming ability of B7-H4, 
we evaluated the effect of ectopic B7-H4 expression on 
tumor formation by the human SKOV3 ovarian cancer 
ceil line. SKOV3 cells were chosen since they represent a 
relevant tumor cell' type for B7-H4 studies but express 
very low, almost undetectable, levels of endogenous B7- 
H4 mRNA and protein (data not shown). SKOV3 cells 
were infected with either a retrovirus expressing B7-H4 
or a control retrovirus followed by G418-selection to 
enrich for infected cells. Expression of B7-H4 protein in 
the selected cells was verified by Western blot (Fig. 7b). 
Control and B7-H4 over-expressing SKOV3 cells were 
implanted subcutaneously into SCID/Beige mice which 
were then monitored for tumor formation. The SKOV3 
tumor cell line is reported to form tumors as xenografts 
in mice [24,25]. As predicted, the control SKOV3 cells 
grew as xenografts where 10 out of 10 mice implanted 
formed tumors (Fig, 7a). All of the mice implanted with 
cells expressing ectopic B7-H4 protein also formed 
tumors, however, these tumors were larger throughout 




Fig. 7. Overexpression of B7-H4 facilitates xenograft tumor formation in SCUVBeige mice, SICOV3 cells were infected with either a retrovirus expressing B7- 
H4 or a control retrovirus followed by G418-selectioa. (a) Control and B7-H4-expressing SKOV3 cell pools were implanted subcutaneously into 10 SCED/ 
Beige mice per cell type and tumor formation was monitored. The graph shows mean group tumor volume over time, (b) Expression of B7-H4 protem in the 
G418-selected cells used for implantation in comparison to extracts of the resulting tumor xenografts was verified by SDS-PAGE followed by Western blot, (c) 
Immunohistochemica! staining of a B7-H4 overexpressing SKOV3 tumor and a control SKOV3 tumor witjb (he A57.1 antibody. 
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the time course wlien compared to the control cells (Fig. 
7a). Statistical analysis of the data showed that the 
increased size of B7-H4-SKOV3 tumors compared to 
control-SKOV3 tumors was significant. At the end of the 
study, the tumors were excised and a Western blot 
showed that the expression level of B7-H4 in (he tumors 
was similar to that observed in the cells prior to 
implantation (Fig. 7b). Immunohistochemical staining of 
tumor sections with an antibody against B7-H4 also 
showed strong cell surface staining of a majority of the 
tumor cells (Fig. 7c) whereas no B7-H4 protein was 
detectable in the control tumors (Fig. 7c). 



Discussion 

We used genomic and molecular biology tools to search 
for genes that are upregulated in human cancers and 
discovered that B7-H4 mRNA was overexpressed in serous 
ovarian cancers and a majority of breast cancers with little 
or no expression in e variety of normal tissues surveyed. 
Western blots with a monoclonal antibody against B7-H4 
showed that B7-H4 protein expression reflected this mRNA 
distribution. In agreement with these data, immunohisto- 
chemical staining of breast ductal adenocarcinoma and 
serous ovarian cancer revealed intense cell surface and 
cytoplasmic staining whereas no staining of normal ovary 
and low levels of apical staining of normal breast epithelium 
was detected. We also confirmed and extended these results 
in a comprehensive immunohistochemical analysis of 
human breast and ovarian cancers as well as a normal 
tissue panel [1 8,26]. In contrast to our results, another study 
used RT-PCR to delect B7-H4 mRNA expression in a 
variety of normal human tissues [12], However, these 
authors indicated that no B7-H4 protein could be identified 
in the same tissues by immunohistochemistry. On the other 
hand, B7-H4 protein was shown to be inducible in T cells 
and antigen presenting cells and was detected by immuno- 
histochemistry in some cases of lung cancer and ovarian 
adenocarcinoma [12,13]. Our findings agree with and 
extend these observations. Of particular note is our finding 
that B7-H4 overexpression is confined to the serous ovarian 
cancer subtype but is not observed with mucinous or low 
malignant potential ovarian cancers. The molecular basis for 
this observation is unclear but could reflect key differences 
in signaling pathway activation between these histologically 
different tumor types. We also describe for the first time the 
overexpression of B7-H4 mRNA and protein in a majority 
of breast cancers of different histological subtypes whereas 
significantly lower levels of B7-H4 could be detected in 
normal breast tissue. In other experiments, we detected 
overexpression of B7-H4 in uterine endometrial cancers 
(data not shown). 

Of interest is the molecular mechanism whereby B7-H4 
is apparently constitutively expressed in ovarian and breast 
cancer cells in contrast to the inducible expression in normal 



immune cells. We hypothesize that signaling pathways that 
are activated in tumor epithelial cells during the progression 
of ovarian and breast cancers could lead to the constitutive 
expression of B7-H4 riiRNA and protein. The signals could 
include the participation of 0-catenin, Ras or tyrosine kinase 
receptors, such as Her2, EGF, HGF and FGF receptors 
which have been implicated in development and progression 
of these cancers [27-29]. Experiments are in progress to 
uncover tumor-specific signal transduction pathways that 
lead to activation of B7-H4 expression. 

Whereas the mouse ortholog of B7-H4 (B7SI) was 
reported to be a GPI-linked protein [1 1], we were unable to 
release native B7-H4 protein from a human breast cancer 
cell line with Pi-specific PLC (data not shown) and 
conclude that the human protein is not GPI-linked. A 
similar conclusion was reached in another study based on 
PI-PLC experiments with transfected cells [13]. B7-H4 has 
a predicted C-terminat transmembrane domain and we have 
proven directly by cell surface biotinylation and by 
immunofluorescence that B7-H4 protein is localized to the 
surface of the human SKBR3 breast cancer cell line. Similar 
results were obtained with other cell types expressing B7- 
H4 as well as by using flow cytometry with live cells and 
antibodies specific for B7-H4 (data not shown). Our data 
also demonstrate that B7-H4 is extensively A^-glycosylated 
and this glycosylation appears to be heterogeneous between 
tumor cell lines and tumor tissues. These differences in 
carbohydrate modification may reflect individual tumor- 
specific glyclosylation patterns and could modulate the 
ability of B7-H4 to interact with its receptors or potential 
partner proteins. 

B7-H4 binds to a putative receptor expressed on 
activated but not naive T cells and thereby leads to 
inhibition of T cell activation and IL2 production [10-12], 
B7-H4 does not interact with receptors for other B7 family 
members including CD28, ICOS, CTLA-4 orPD-1 [10] and 
to date the specific identity of the cell surface receptor that 
binds B7-H4 has not been elucidated. A receptor called B 
and T lymphocyte attenuator (BTLA) was proposed for B7- 
H4 based on indirect evidence showing that activated 
lymphocytes lacking BTLA exhibited reduced binding to 
B7-H4 compared to the wild type cells [10,30]. However, 
B7-H4 and BTLA do not interact with each other based on 
direct binding experiments ([31] and diaDexus unpublished 
results). It is still possible that BTLA could regulate the 
expression, cell surface localization or ligand binding of a 
B7-H4 receptor. 

Despite the lack of a defined B7-H4 receptor, it is clear 
that B7-H4 binding to T cells leads to inhibition of T cell 
activation and, conversely, a neutrali2ing antibody against 
B7-H4 enhanced antigen specific T cell responses [10-12], 
Since B7-H4 is overexpressed in breast and ovarian cancers, 
it could function to inhibit a host anti-tumor response and 
promote tumor escape from immune surveillance. Conse- 
quently, blockade of tumor associated B7-H4 could offer a 
new therapeutic opportunity for enhancement of anti-tumor 
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immune responses. A similar function has been described 
for another B7 family protein, B7-H1 (PD-L1), which is 
normally expressed by macrophage lineage cells and is also 
abundant hi a variety of cancers [32-36], B7-H1 binds' to its 
receptor, PD-1, on activated T cells and B cells leading to 
cell cycle arrest and apoptosis [32,37], Tumor-expressed 
B7-H1 was able to inhibit T cell activation and increase 
apoptosis of antigen specific T cells leading to increased 
tumor growth in mice and these inhibitory effects could be 
abrogated with B7-H1 blocking antibodies [35,36,38]. 

The B7-H4 functional studies described here were 
carried out either with cultured epithelial cells in the 
absence of immune cell types or with human cancer cell 
line xenografts in SCID mice which lack most elements of a 
functional immune system. Therefore, our data suggesting 
that B7-H4 can inhibit apoptosis and promote tumor cell 
growth reveal a new tumor-specific function for this protein 
aside from its ability to modulate immune cell function. 
Furthermore, while B7-H4 is a membrane protein, it lacks a 
cytoplasmic domain and consequently it is likely to 
participate in signal transduction through binding to a 
receptor on tumor cells in either a cell-autonomous or non- 
autonomous fashion. B7-H4 could also function as a co- 
receptor for another membrane receptor with signaling 
functions. It will be of interest to define whether the 
receptors) and signal transduction pathways mediating the 
inhibitory effects of B7-H4 on T cells are same as those 
participating in the tumor promoting effects of B7-H4 in 
epithelial cancer cells. The existence of two different 
receptors with opposing signaling outcomes has been 
documented for other B7 family members, B7-H1 and B7- 
H2, which bind to CTLA-4 on activated T cells to mediate a 
negative signal and to CD28 on resting cells providing a co- 
stimulatory signal [8,39], Experiments are underway to 
define the players and signaling pathways that are activated 
by B7-H4 in tumor cells. 

In summary, we have shown that B7-H4 mRNA and 
protein are overexpressed in a majority of serous ovarian 
cancers and breast cancers with little or no expression in 
normal tissues. We show conclusively that B7-H4 protein in 
tumor tissue and tumor cell lines is heavily glycosylated and 
localized to the cell surface. Our data also provide a new 
function for overexpressed B7-H4 in promoting epithelial 
cell transformation. Together with the proposed immuno- 
modulatory role for B7-H4, our new data validate B7-H4 as 
a promising new target for therapeutic antibody develop- 
ment and treatment of breast and ovarian cancers. 
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Professor in both Medicine and Pathology at the Harvard 
Medical School. I am also the Director of the Reproductive 
Endocrine Reference Laboratory and the Boston Area Diabetes 
Endocrine Research Center's Immunoassay Core laboratory. 
The laboratories that I oversee perform clinical testing 
services for patient care, human investigations and animal 
research. My basic investigations involve the biochemistry, 
physiology and pathophysiology of the activin-binding 
protein follistatin. Recent studies have focused upon 
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identifying the specific epitopes directly involved with 
activin binding and delineating allosteric effects of 
activin binding in altering domain-specific antigenic 
epitopes in the holoprotein. In addition, my laboratory is 
involved in clinical studies directed toward developing 
methods and technological approaches for serum measurements 
of novel ovarian cancer markers using SELDI and other high 
resolution proteomic procedures. My laboratory also conducts 
studies of commercially available and new immunodiagnostic 
assays to evaluate their analytical performance and clinical 
utility. These studies encompass immunoassays for endocrine 
hormones, tumor markers, cardiac markers, and therapeutic 
drugs . 



3 . Having worked in the area of immunodiagnostics for 
over 20 years, I am very fainiliar with the methods and tools 
used to identify antibodies for a protein or peptide encoded 
by a defined nucleic acid. 



4. I have reviewed the above-referenced patent 
application and. the Office Action mailed October 22, 2 007. 
In particular, I have reviewed the Examiner's reasoning 
behind his statement that "the specification does not teach 
the protein sequence or the open reading frame of SEQ ID 
NO:l" and ,! [t]hus . . . does not provide enough information 
to indicate for which protein the claimed antibodies are 
specific" . I disagree that utility of antibodies for a 
diagnostic cancer marker expressed by a defined nucleic acid 
is dependent upon identification of "the" protein sequence 
or the open reading frame. 



5. The Sequence Listing of the patent application sets 
forth a number of nucleic acid sequences including SEQ ID 
NO:l and associated fragments, e.g. SEQ ID NO: 10, SEQ ID 
NO: 11, SEQ ID NO: 12 and SEQ ID NO: 13. A number of 
characteristics of SEQ ID NO:l are described in the patent 
application and/or are ascertainable from the nucleic acid 
sequence itself. Perhaps of most importance is data 
presented in Examples 1 and 2 of the patent application 
relating to mRNA overexpression of OvrllO. This data 
demonstrates to me the utility of OvrllO as a diagnostic 
marker for gynecologic cancers. 
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6. Generating proteins and peptides encoded by a 
nucleic acid was routine as of 1998, and there were a number 
of computer programs routinely used as of 1998 to identify 
potential open reading frames and deduced proteins and 
peptides expressed by a defined nucleic acid sequence. 



7. Also routine as of 1998 was to utilize the 
generated proteins and peptides encoded by the defined 
nucleic acid sequence (such as SEQ ID NO:l or its fragments) 
to make antibodies and to routinely select antibodies for 
their ability to detect cancer. 



As discussed below, once the nucleic acid sequence is 
specified there were several approaches available to those 
skilled in the art in 1998 to generate antibodies that could 
be used to formulate tests for circulating proteins 
originating from the nucleic acid sequence revealed. It is 
the teaching of the patent that this sequence, and the other 
sequences revealed by the "ovary specific gene" approaches 
described, is associated with ovarian cancer that focuses 
the well established routine work needed to then generate 
and validate antibody-based diagnostic methods that utilize 
the coded proteins as biomarkers for ovarian cancer. 

8. Further, I disagree with the Examiner's suggestion 
that identification of a protein sequence or an ORF in the 
patent application is required for one of skill to identify 
structural or functional attributes of antibodies to 
proteins or peptide fragments of a defined nucleic acid 
sequence. 

The nucleic acid sequences contain all the information 
needed for one skilled in the art to predict, using software 
tools available in 1998, all proteins that could be coded. 
These protein sequences could then be used in homology 
searches, again using software and databases available at 
the time, to identify target immunogens for specific 
antibody generation. 

The predicted sequences could also have been subjected 
to antigenic epitope modeling to identify small immunogens 
that could easily be synthesized and use to generate panels 
of site specific monoclonal antibodies which then could be 
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routinely selected for recognition of endogenous protein 
products of the nucleotide sequences taught in the patent. 

9. Thus, I believe this patent application does 
provide sufficient information to identify antibodies or 
antibody fragments that bind to and/or detect proteins or 
protein fragments expressed by SEQ ID N0:1 which are useful 
as cancer diagnostic agents. 



I hereby declare that all statements herein of my own 
knowledge are true and that all statements made on 
information or belief are believed to be true; and further 
that these statements were made with the knowledge that 
willful statements and the like so made are punishable by 
fine or by imprisonment, or both, under §1001 of Title 18 of 
the United States Code, and that such willful statements may 
jeopardize the validity of the application, any patent 
issuing there upon, or any patent to which this verified 
statement is directed. 




Patrick M. Sluss 



4/21/2008 
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Lonlng Fu, Mark D.Minden arid 
Sam Benchimol 

The Ontario Cancer Institute/Princess Margw* Hospital and 
Department of Medical Biophysics, University of Toronto, 
610 University Avenue, Toronto, Ontario, Canada M50 2M9 

In blast cells obtained from patients with acute myelo- 
genous leukemia, pS3 mRNA was present in all the 
samples examined while the expression of p53 protein 
was variable from patient to patient Mutations in the 
pS3 gene are infrequent in this disease and, hence, 
variable protein expression in the majority of the 
samples cannot be accounted for by mutation. In 
this study, we examined * the regulation of p53 gene 
expression in human leukemic blasts and characterized 
the p53 transcripts In these cells. We found control 
both at the level of RNA abundance and at the 
level of translation.' Four experiments point towards 
translational control of human p53 gene expression* 
First, there is no correlation between the level of pS3 
mRNA and the level of pS3 protein expression in blast 
cells. Second, in two cell lines with similar levels of 
p53 protein expression but with different levels of pS3 
mRNA, we find that there Is preferential association 
of pS3 mRNA with large polysomes in the cells with 
less p53 RNA, Third, translation of synthetic human 
p53 transcripts In cell.frce extracts. Is Inhibited by* the 
pS3 3'UTR. Fourth,-4he pS3 3'UTR, when present 
in cis f can rep ifess'" translation of a heterologous tran- 
script. These observations raise the possibility that 
human pS3 mRNA translation may be regulated in vivo 
by RNA binding factors acting on the p53 3'UTR, 
Keywords: acute myelogenous leukerrda/p53/translationaI 
control 



Introduction 

Human acute myelogenous leukemia (AML) is a clonal 
disease arising in a very early hematopoietic progenitor 
cell following multiple carcinogenic events (Wiggans 
et al, 1978; Ralkow et al, 1987). Mutation of the p53 
tumor suppressor gene occurs infrequently in the blast 
cells of AML patients (Fenaux et al, 1991, 1992; 
Slingerland et al, 1991; Sugimoto et al, 1991, 1993; 
Zhang ct al, 1992; Trecca et al, 1994; Wattel et al, 
1994; Lai et al t 1995). p53 mutations have been detected 
in -10% of all AML patients, mostly in patients with 17p 
monosomy who had lost the normal remaining p53 allele 
(Lai et al, 1995). These studies demonstrate that p53 
mutations are not required for the development of AML. 
Mutations that do arise, however, are generally recessive 
in nature, indicating a strong selective pressure to eliminate 
completely wild-type p53 protein function. 
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The scarcity of p53 gene mutations in AML- is^nqtv' 
unique to this disease. For example, pS3 gene mutations.;, 
are rare in neuroblastoma, testicular tumors and ■Hj^Y£i 
positive cervical cancer. While the .pS3 gene' is;'6jb<& 
commonly inactivated through mutation In human' turop^> ; 
p53 protein function can also be disrupted through lih&v 
genetic mechanisms including protein-protein wteracHc&'$ 
(Scheffnercfo/., 1990; Momande-r al, 1992; OlhtoXfflZrf 
1992; Ueda et al, 1995), protein conformational .change'-'' 
(Milner, 199 1 ; Ullrich et al, 1992) and nuclear exclusion . 
(Moll et al, 1992, 1995), Indeed, two groups- 'haye 
suggested that inactivation of wild-type p53 protein 'tir-jj 
AML occurs through a mechanism involving cprifdcji^:>i^ 
ational change of the protein (Zhu et al, 1993;' Zjiahg * 
etal, 1992). . / 

The level of p53 protein expression in primary- Wast 
cells obtained from AML patients varies from'pdtiem'to..i 
patient. In previous studies from this laboratory. p53- 
protein expression was detected in only. 45% (34 ,of 
75) blast samples examined by metabolic labelling with 
[ 35 S]methionine and i'mmunoprecipitation (Smith tt al, . 
1986; Benchimol et al, 1989; Slingerland et al, 1991), 
Zhang et al (1992) detected p53 protein expression in., 
blast samples from 75% (37 of 49) AML patients. Several 
reasons may explain the absence or very low level of p53- 
protein expression in certain blast samples. These include £ } 
' low levels of p53 mJRftA, inhibitioa of p53 mRrJ^y' 
translation and extremely rapid* turnover of newly fij*tfii£ 
sized -p53 protein. In .(his- study, we have exarnined^* 
regulation. of p53 gene expression in human AML.p.(a 
and find control both at the level of RNA abundance)) ^ 
at the level of' translation. Translational. regulation' is,'y* 
supported by experiments in which w*e demorostra^'that';; 
the p53 3' untranslated 1 region (3'UTR)oan repress transla^ 
tion of p53 RNA and of heterologous, transcripts in cell-, 
free extracts. ■ ' V /' J '.'0\ 



Results 

Expression of p53 protein in human AML ■ f . 
Leukemic blast cells from AML patients and three human 
acute leukemia cell lines OCI-M2, OCI/AMI^3 and OO/ - rt 
AML-4 were characterized for p53 protein expression by 
metabolic labelling and immunoprecipltation. 0CI-M2 is ^ 
a human erythroleukemia cell line (Papayannopoulou 
et al, 1988) previously shown to contain a mlSsense \ 
mutation in the p53 coding region at codon 274 and to ? 
have lost the homologous wild-type p53 allele (Slingerland • 
etal t 1991). OCI/AML-3 and 0CI/AML4 cell lines were 
derived from the primary blasts of two AML patients ^ 
(Wang et al, 1989). The full-length p53 transcripts in :■ 
these cells were amplified by RT-PCR, and the products^ 
directly sequenced. We found that the p53 transcripts in. 
both cell lines were wild-type throughout their coding 
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Fig. I. Expression of p53 protein in human leukemia ccIU. (A) Cell 
lines 0CI-M2, OCl/AML-3 and OCl/AMW, and blast cells from 
AML patients were metabollcally labelled with l 33 S)methionine for 
15 mln at it*C. Cell extracts were prepared and portions representing 
equal amounts of trichloroacetic acid-insoluble radioactivity 
<10 T c.p.m.) were Immunopredpiiated with the control monoclonal 
antibody (PAW 19) or with monoclonal antibodies against pS3 
(PAW2I). (B) Detection of p5J protein tn 5X10 6 cells by Western 
immunoblottlng and ECL using PAblBOt monoclonal antibodies. 

regions as well as through their 5'- and 3'UTKs. The only 
difference detected in the p53 transcripts expressed in 
OCI/AML-3 and OCI/AML-4 cell lines was the recognized 
polymorphism at codon 72 (Matlashewski.ef aL t 1987) 
resulting in an argiffine residue in OCI/AMI^3 and a 
proline residue in OCI/AML-4 at position 72. 

the level of protein expression measured by metabolic 
labelling and immunoprecipitation is dependent primarily 
on the rate of protein synthesis, the rate of protein 
degradation and the amount of mRNA available for 
translation. To minimize the contribution of protein half- 
life on the detection of p53 protein synthesis during the 
metabolic labelling assay, cells were exposed to a short 
IS min pulse of { 35 S]methionino at 37°C followed by 
immediate lysis on ice in the presence of protease inhibi- 
tors. Radiolabeled cell extracts prepared in this way were 
then subjected to iiruiiunoprecipitation with p53-speclfic 
antibodies. p53 protein with a half-life much less than 15 
mln, how6ver, might remain undetectable by this assay. 
p53 protein synthesis .was detected in OCI/AML-3, OCU 
AML-4 and in OCI-M2 (Figure 1A) as well as in seven 
of 16 blast samples tested; three representative examples 
are shown in Figure 1A* 

The steady-state level of p53 protein in the three cell 
lines was determined by Western blot analysis using 
PAM801. Densitometric scanning of the blot shown in 
Figure IB revealed that the amount of p53 protein in OCU 
AML-3 and OCI/AML-4 was similar and -1 0-fold lower 
than in OCI-M2. The high level of p53 protein in OCI- 
M2 was expected since mutant p53 polypeptides usually 
have much longer half-lives than wild-type p53 proteins 
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Fi* 2, Northern blot analysis of p53 mRNA la human AML cells. 
20 ue of total RNA isolated from cell lines or paUem blast samples 
was wartted on a I* agarose gel containing 6% formaldehyde, 
transferred to nitrocellulose and bybridlted with n P-labellcd human 
P 53 cDNA. After autoradiography, the probe was mmd and Ow 
filters were hybridized with a probe specific for US nbosoraal RNA. 
The relative abundance of P 53 mRNA was determined by 
phosphorimage analysis after normalizing to the value of IBS 
ribosomil RNA in each sample. 
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Rg 3 RelaUve abundance of p53 mRNA In cells that do or do not 
exoress detectable p53 protein. p53 protein synthesis was assessed in 
16 AML blast samples by metabolic labelling with ["SJmethiooinc for 
15 min and immunoprecipitation. P 53 protein synthesis was delected 
In seven of these' samples. p53 mRNA levels were determined by 
Northern blot analysis as described In the legend to Bgure 2. 

and as a result mutant. p53 polypeptides accumulate 
intracellularly. 

Expression of p53 mRNA In Human AML 
To determine whether the differences in p53 protein 
expression in leukemic blasts reflected differences in the 
abundance of p53 mRNA, RNA was isolated from AML 
blast samples and cell lines, and subjected to Northern 
blot analysis. The relative abundance of p53 cnRKA in 
cells was estimated by phosphorimage analysis after 
normalizing to the value of 18S ribosomal RNA in each 
sample. The results are shown in Figure 2 and indicate 
■ that the 16 AML blast samples examined synthesized a 
single species of full-length P 53 mRNA -M to « 
The relative amount of P 53 mRNA in the 16 samples 
varied over a 27-fold range. No correladoh was evident 
between P 53 protein expression (on the basis of the 15- 
min metabolic labelling assay) and the level of p53 mRNA 
in AML blasts (Figure 3), ■ . 

OCI/AML-3 and OCI/AML4. cells contained simUar 
amounts of P 53 protein. However, fee RNA blot shovm 
in Fteure 2 Indicated that, the abundance of p53 mRNA 
was tfold higher in Od/AML-3 than in OCI/AML-4. A 
4- to 8*fold difference in p53 RNA was seen In repeated 
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polysomes compared with CXJi/AMi^j. tjeus.u 
lected and lysed in the presence of cyclohr " 
MgCl 2 , which stabilize the association of riK„ 
mRNA. The lysates were sedimented through, a'-,,., , 
sucrose gradient and fractions were collected. RN# 4&, 
extracted from each fraction and analyzed for the pr^eB^ ? 
of p53 mRNA by dot-blot hybridization with a^la^ST 
p53 cDNA probe. The gradients were calibrated wfr: 
polysomes prepared from lysales by precipitation- with:; 
100 mM MgClj. Polysomes were found at the bottom, of.; 
the gradient in fractions 5-10, while monosomes were ' 
found in fracUons M. p53 mRNA from OCVAML-4 • 
cells was associated with larger polysomes than was $53 K 
mRNA from OCI/AML-3 cells (Figure 4). w OCI/AML4 
cells, 39% of the p53 mRNA was found in fractions' 7- 
10 containing high molecular weight polysomes, white In 
0O/AMU3 cells 21% of the p53 mRNA was found In 
these same fractions. As an internal control, the distribution 
of ribosomat protein L35 RNA was compared and shown 
to be identical in OCl/AML^3 and OCI/AML4 (data 
not shown). 



Fig. A, Association * p53 mRNA with poIy«»m« £ 0O/AM^3 (A) 
and OCI/AMW (B) cells. The Association of p53 mRNA with 
polysomes in Oqi/AML-3 and Oa/AMW cdls was camped. Cell 
extracts coattiitfog polysomes were prepared In (he presence or 
c^Srinude^d loadcTon a 1540ft l^ ^ ^C Ten 
fractions were coDeded and die amount of p53 mRNA to each 
&Soa was determined by dot-Wol hybridkation analy^ with a 
*?MabeU«i human P 5 J cDNA probe. The size of the polysomes with 
aspect to the gradient was estimated using a polysome preparation 
fromOCUAML.3 cetls. The positions of free ribosomes, monosomes 
and polysomes are bdicated. EmJr bats represent the standard error of 
the mean Trom three separate experiments. 

experiments after normalization with probes that detect 
18S ribosomai RNA or GAPDH to ensure equivalent 
loading of RNA samples on the gels. We conclude that 
P 53 RNA levels and p53 protein expression are variable 
in AML blasts and.cetl lines, and that the. level of P 53 
protein expression is not related lo the amount of p53 
mRNA in these cells, 

Association ofp53 mRNA with polysomes 
Tb test whether p53 gene expression is under trajstaiiorud 
• control in vivo, the association of P 53 mRNA with 
polysomes in OO/AML-3 and OCI/AML-* cells was 
analyzed (Figure 4). If p53 mRNA Is more translate nally 
active in OcW>4 than in OO/AMW m the above 
results suggest, then a larger proportion of the p53 mKNA 



Analysts of the & end ofp53 mRNA 
The human p53 gene has been shown to have a cluster of 
' six or seven major transcription initiation sites and several 
minor sites lying further upstream (Tuck and Crawford, 
1989). Transcripts initiating from the minor sites would 
have a longer 5'UTR with potential to form a stable stem- 
loop structure close to the 5'" cap. Such structures would 
not be expected to form in transcripts initiating from the 
major start sites. System-loop structures were- described 
for rodent P 53 mRNA (Blenz et at, 1984; Bieoz-Tadmor 
et ai, 1985). Recently, mouse p53 protein was shown to 
, bind to the 5'UTR and to inhibit translation of its own. 
mRNA in an in vitro assay system (Mosner ef at, J8?» 
Stable stem-loop structures in the 5'UTR regions of 3 
number of mRNA transcripts have been shown to inMb* 
translation -Hritiation by interfering with the ac^vlry-^ 
. translation Initiation factors or by serving as binding sites, 
for regulatory proteins that inhibit translation (Beng-and. 
Holland, I988i Fu et*L 1991; Melefors and Hentzv 
1993; Pause et a/., 1993);- * *• 

To determine if the low levej of p53 protein expression .. 
in leukemic blasts was the iwutt.oftranscrii^n iid^ 
£ the minor start sites, the 5' ends of p53 mRNA present • 
In different blast samples and oeU lines were mapped 
using an RNase protection assay. A 729 nucleotide and-, 
sense RNA probe containing genomic sequences from 
^3 promoter region fused with cDNA sequences 
extending into exoh 4 was generated by trariscr^Uon wiui 
|p IMA polymerase in the presence of PWTP (Figure 
5A). This probe would yield protected' p53 N{f ^ 
385 nucleotides corresponding to transcripts ongn^g 
from the major startsiteand 449 nucleoUdes 
to transcripts originating from the most 5 of Itomtaor 
start sites. Total RNA extracted from OCI/AML-3 and 
OCI/AML4 cell lines and from seven AML W^t samples 
was examined. After digestion, the protected foE™ats. 
were resolved by electrophoresis on a denaturing potf- 
SarrUoe geL As shown in Figure SB, 
orSected foment in all the RNA samples was 385 
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Pig. S. RNase protection assay, (A) The map of the p729 plasmid. The 
p729 plasmid was constructed as described under Materials and 
methods, After linearisation with Hinilll, & 729 nucleotide antlsense 
RNA probe was generated by transcription with SP6 RNA polymerase 
yielding protected p53 fragments of -385 nucleotides due to p53 
transcripts initiating from one of the major start sites (a) and 449 
uicleotides due to p53 transcripts initiating from the most 5' of. the 
minor transcription start sites (b). (B) The 729 nucleotide { 32 PJUTP- 
labelled an li sense RNA probe was annealed to 30 ug of iota! RNA 
extracted from OCI/AML-3 and OCI/AML-4 cell lines and seven 
AML blast samples before digestion with RNase A and RNase Tl. 
The protected fragment* were separated by electrophoresis on a 6% 
polyftcrvtamlde-8 M urea gel and visualized by autoradiography. The 
positions and size (nucleotide length) of 5' end-labelled fragments of 
*fjp[-digested pBR322 plasmid DNA are Indicated on the left The 
bottom arrow Indicates the position of the major protected fragment 
and the top arrow indicates the undigested probe. 



of pS3 gene transcription in leukemic blasts at the major 
start site: These data indicate that, in contrast with murine 
p53 mRNA, stable secondary structures are unlikely to 
exist at the 5' end of human p53 mRNA. 

Analysis of the 3 r end of p$$ mRNA 
Human p53 mRNA' contains a long 3'UTR of 1176 
nucleotides with an Alu-like repetitive sequence element 
of ~470 bp located immediately upstream of the poly(A) 
tail (Matlashewski ei <xL 1984). The Alu-like sequence is 
in the reverse transcriptional orientation with respect to 
the p53 gene. Furthermore, the Alu-like sequence is 
missing in murine p53 transcripts and it interrupts a region 
in human p53 mRNA which shows homology to mouse 
p53 mHNA. When analyzed with the FOLD program of 
GCO, the Alu-like element in the 3'UTR of human p53 
mRNA Is predicted to form an independent secondary 
structure that does not have long-range interactions with 
other regions of p53 mRNA. In the presence of a poly(A) 
tail, the secondary structure formed by the Alu-like element 
is predicted to remain essentially intact except that a SO 
nucleotide U-rich sequence at the 5' boundary of the Alu- 
like sequence will interact with the poIy(A) tail. The 
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Fig. 6. In vitro translation of synthetic p53 RNA containing variable 
portions of the 3'UTR. (A) Plasmid template used to synthesize p53 
RNA in vitro. The p2516 plasmid was constructed by Inserting the 
entire 2.5 kb wild-type p53 cDNA sequence downstream of the 
bacteriophage SP6 promoter in a pSP64-derived plasmid. Transcription 
from the SP6 promoter present in p25t6 leads to the production of 
transcripts In which the first 10 nucleotides are derived from plasmid 
sequences while the remaining nucleotides are derived from the pS3 
gene beginning at the +7 position 1 of native p53 transcripts Initiating 
from one of the major transcription start sites (Tuck and Crawford, 
1989). Linearization of p25 1 6 at the £coRI site before In vitro 
transcription generates a full-length, Alu-coniaming p53 transcript 
(T2516); linearization at the BamW, site provides a template for the 
synthesis of a truncated p53 transcript missing a portion of the 3'UTR 
containing the Alu sequence (T2034). Both transcripts were 
polyadenylaied h vitro to generate p2516An and p2034An. The open 
rectangles showriNjn the transcripts represent the position of the p53 
coding region. (B) 50 Qg of the in v/fn^synlhcsixcd T2G34, T2516", 
T2034An and T25l6An p$3 RNAs were translated in a rabbit 
reticulocyte lystfe at 3Q'C for 30 tnla ia the presence of 
("S] methionine followed by tarbxinopreclpitation, SDS-PAGB and 
autoradiography. In (he 3X T2516 and 3X T2516An lanes, 150 tig of 
T25 16 or T2516An RNA was added to (he in vitro translation 
reaction. The right panel presents the results of a Northern blot tn 
which 50 ng of synthetic p53 RNA was applied to an agarose- 
formaldehyde gel, blotted and hybridized to ^-labelled human p53 
cDNA. 



extended base pairing between U and A residues will 
further stabilize the secondary structure formed by the 
Alu-like element. To determine whether or not the Alu- 
like repeat present in human p53 mRNA might constitute 
a negative regulatory clement during translation, a series 
of in vitro transcription-translation experiments was per- 
formed. 

An SP6-derived plasmid containing human wild-type 
p53 cDNA including the entire 3'UTR was constructed 
(p25l6 in Figure 6A). p2516 was linearized with EcaRI 
or with. flamtfl and used as a template for in vltrv 
transcription. In some reactions, a po!y(A) tail of 200- 
300 adenylic acid residues was added to synthetic pS3 
RNA using poly(A) polymerase. In this way, four synthetic 
p53 transcripts were generated: TO 1 6 An. and T2516 
represent full-length, Alu-containing transcripts with or 
without a poIy(A) tall; T2034An and T2034 represent 
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«iy(A) tail These transcripts were then used as templates 
for translation in a rabbit reticulocyte lysate containing 
("Slmelhionine. P 53 protein synthesized in Wrro was 
immunoprccipitated with PAb421 monoclonal anubody 
and visualized by autoradiography (Figure nw 
amount and integrity of the synthetic P 53 RNAs added to 
the in vitro translation reactions was monitored by agarose 
gel electrophoresis and Northern blotting as shown m he 
right panel of Figure 6B. Densitometry tracing of he 
data indicated that the Alu containing, non-polyadenylated 
lranscriplT25l6wastranslated^-foldlessefficientty^ 
the Alu-deficient, non-polyadenytated transcript ™M. In 
addition, the polyadenylated, Alu containing transenpt 
T2516An was translated ~20-fo!d less efficiently than the 
polyadenylated, Alu-deficienl transcript T2034An ; These 
data indicate that the Alu-like element present in the P 53 
3'UTR can inhibit p53 mRNA translation in vitro even 
in the absence of a poly(A) tail. The predtcted interaclton 
of the poly(A) tail with the Alu-like element appears to 
increase further the inhibition of translation. 

To test further the inhibitory activity of the P 53 3 UTK 
we examined the ability of the P 53 3'UTR to control Uie 
translation of a heterologous RNA. The 
P 53. DNA fragment extending from nucleotides 2034 
to 2516 was excised from plasmid P 25l6 and inserted 
downstream of a heterologous gene (CAT gene) in an 
SP6-based plasmid vector to generate the plasmid pCAi- 
Alu (Figure 7 A). In vitro transcription and- translation 
revcalei that non-polyadenylated CAT-Alu RNA was 
translated 5-fold less efficiently than non-po yadenylated 
CAT transcripts lacking the Alu sequence (Figure 7B). 
When a different region of the P 53 3'UTR (nuclides 
1465-2034 in plasmid P 2516) with approximately the 
same length as the Alu-containing fragment was inserted 
oownstretm of the CATgene, no effect onCAT^slation 
was observed (CAT-BS in Figure 7B). The ability of the 
Abstaining *gment of the P 53 3'UTR to act on a 
heterologous Lnscript indicates that it likely represses 
translation Independently of upstream sequences. 

The inhibitory activity of the Alu-like element on p53 
translation was likely the result of its action in cU and 
not simply due to non-specific inhibition of transhaon 
sine* a 3-fold increase in the amount of Alu-contalning 
transcript added to the reticulocyte lysate resulted in a 
corresponding increase in the amount of p53 1 protein 
synthesized (Figure 6B), Furthermore, when 200 ng of 
luciferase RNA was added to a reticulocyte lysate together 
with 200 ng of CAT-Alu or CAT-BS RNA, there was little 
difference in the amount of luciferase synthesized (Figure 
7C). Similarly, when 200 ng of luciferase RNA was added 
to a reticulocyte lysate, either alone or mixed with 200 ng 
of T2034 or T25l6An RNA, there was little difference in 
the amount of luciferase synthesized (data not shown). 

To confirm that the decrease in P 53 protein syndesis 
from Alu-contalning P 53 RNAs was due to ^anslational 
regulation and not due to P^*™"* 1 ^J*™ 
in the reticulocyte lysate, adenylated T2034 and TO 6 
synthetic transcripts were added to the nbbtf reticulocyte 
lysate under the same conditions as those used for in wr« 
translation. After incubation for 15 or 60 mm, RNA was 
extracted from the lysate and the amount of synthetic p53 
RNA present in the lysate determined by Northern blot 




Fig. 7. The p53 Alu-like element can inhibit translation of a 
heterologous CAT transcript. (A) Plasraids used to generate CAT 
transcript* In vitro, (B) 200 ng of fa wVro-synlheslxed CAT, CAT-Alu, 
and CAT-BS transcripts were translated in a rabbit reticulocyte lysate 
at WC for 30 rain in the presence of [ 33 S]methlottlne. The reactions 
were stopped by adding an equal volume of the 2X protein sample 
buffer, heated to 100*C for 5 min and anaiyied by SDS-PAQB and . 
autoradiography. An ethldlum bromlde-sttined agarose gel . ■ . 
demonstrating (he integrity end amount of synthetic transcripts that 
• were added to (he In vitro translation reaction b shown below. < 
CO 200 ng of lueiferase RNA was translated in a rabbit reticulocyte .rj\ 
lysate. either alone or in the presence of 200 ng of CAT-BS or 200 zf gA 
of CAT-Alu. Reaction mixtures were incubated in the presence. oT;,^-^ 
r^lmetHonlne at 30*C for 30 min and processed as ifl (B). The '-vM 
in vim-syntheslwl ludfecase protein Is shown in the m pind: Os 
RNA used for in vttb translation Is shown In the ethldium bromide .*% 
julned-tujarose gel in the bottom panel- 

analysis. Enhanced degradation of the Abstaining .. .™ 
transcript was not observed (Figure 8). .We conclude (hat 
a segment of the p53 3'tlTR encompassing the Alu-like *. «| 
element is capable of repressing translation in vltm. •■ "™ 

Discussion 

The observation that wild-type p53 protein expression in 
leukemic blast cells does not correlate with the ^vel ot 
p53 mRNA mirrors findings reported previously for blasu 
and other' human cell types (Matlashewski « f^, WW. 
Kastan et at, 1991a; Slingerland et at, 1991; Sasano 
et ai, 1992; Hsu et aL t 1993). Tne absence of detectable 
p53 protein in cells expressing abundant levels of wild- 
• type p53 mRNA has usually been attributed to the short 
half-life of P 53 protein in normal cells (Rogel e/oj. 
1985), A similar situation exists in papillomavirus (ttfv*- ■ 
infected cells such as HeLa cells where p53 i protein is ; 
detected even though these cells produce p53 mRNA ana 
this RNA is associated with, polysomes (MatlasheWSKi 
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Fig. 6, Stability of synthetic human p53 RNAt in rabbit reticulocyte 
lyjatei. 100 ng of adenylated T2034 <A) and T25I6 (6) synthetic 
RNA was added to (he rabbit reticulocyte lysate and. incubated at 30*C 
for 15 or 60 mln under the same conditions used for In vitro 
(mutation. UNA present In the lysates was then extracted and loaded 
on a \% agarosc-formaldehydc gel. Tie 0 mln time point represents 
100 ng of synthetic RNA loaded directly on the gel. The amount of 
pS3 RNA In each sample was then determined by Northern blotting 
using a J1 P-labetled human p53 cDNA. The lower panel shows (he 
28S and US ribosomal RNA* recovered from the rabbit reticulocyte 
lysates detected by ethidlum bromide staining of the gel. 

et a/., 1986). The enhanced degradation of newly synthe- 
sized p53 protein in HeLa cells was shown to be promoted 
by the papillomavirus E6 protein which is expressed 
conslitutively in these cells (Scheffuer et al % 1990). 

In this report, wc present data showing that differences 
in p53 mRNA abundance exist in AML blasts and that 
these differences cannot explain the heterogeneity in the 
level of p53 protein expression in leukemic blast cells. 
Using a metabolic labelling assay in which blasts from 
different AML patients were pulse-labelled with [ 3S S]- 
methionJne for 15 min to minimize the contribution of 
protein half-life on tne detection of p53 protein synthesis, 
we found differences in the level of p53 protein expression 
in blast samples. These observations raised the possibility 
that p53 gene expression may be regulated at the transi- 
tional level in certain human cells. We tested this possibility 
by analyzing the distribution of p53 mRNA on polysomes 
in vivo and by examining p53 RNA translation in vitro. 

We have used two AML cell lines, OCT7AME^3 and 
OCI/AML-4 that contain similar amounts of wild-type 
p53 protein even though OCI/AMl^3 contains 4- to 8-fold 
more p53 mRNAl Comparison of the polysome profile of 
these cells indicated that a greater proportion of the p53 
mRNA was associated with larger polysomes in OC1/ 
AML-4 than in OCI/AMU3. p53 raRNA in both of these 
cell lines as well as in blasts from different AML patients 
is present as a single, full-length species of ~2.8 kb 
that initiates from a common transcription start .site and 
contains similar sequence and structural elements. 

TVanscriptioti-translation experiments in vitro indicated 
that the p53 3'UTR contains a negative regulatory domain 
that is capable of repressing translation in vitro, A region 
of the 3'UTR consisting of ~500 nucleotides and con- 
taining an Alu-like element is capable of repressing 
translation of p53 mRNA and of a heterologous transcript. 
The p53 3'UTR, when present in cis, repressed translation 
of polyadenylated as well as non-polyadenylated tran- 
scripts. Accordingly, we suggest that the Alu-like element, 



possibly through its secondary structure, is capable of 
repressing p53 mRNA translation. In addition., Interactioa 
of the Alu-like element with the poly(A) tail may repress 
the latter's function In translation* Experiments are hi 
progress to map precisely this regulatory clement in the 
pS3 3'UTR and to determine if the p53 3'UTR plays a 
similar role in regulating translation in vivo, 

Our finding that p53 protein expression in AML blasts 
is controlled, at least in part, through mechanisms acting 
at the translational level, raises the possibility that transna- 
tional regulation may provide an epigenetic mechanism 
to reduce or even eliminate wild-type p53 protein function; 
in leukemic blasts. In preliminary experiments to address 
this point, we have exposed blast cells that express little 
or no detectable p53 protein to 6 Gy of ionizing, radiation 
and have observed increased steady-state levels of p53 
protein at 1.5 h after irradiation (data not shown). Geno- 
toxic agents have been shown previously to increase the 
level and/or activity of pS3 protein through a post- 
transcriptional mechanism that is not well understood 
(Kaslan et at t 1991b; Fritsche et al< 1993; Lu and Lane, 
1993; Zhan et of., 1993). Hence, blast cells retain the 
ability to up-regulate p53 expression in response to geno- 
toxic stress. At least under these conditions, p53 function 
may not be lost. This type of analysis, however, does not 
address the function of p53 in proliferating cells that have 
not been exposed to genotoxic stress. In this regard, 
previous studies from our laboratory demonstrated a highly 
significant correlation between p53 protein expression in 
leukemic blast cells and the secondary plating efficiency 
of these cells (Smith et al t 1986). The latter provides an 
estimate of the self-tenewal capacity of progenitor cells 
in the blast population. Deregulated pS3 expression might, 
therefore, be expected to affect the self-renewal capacity 
of blasts in, the absence of genotoxic stress. 

Accumulating evidence demonstrates the involvement 
of the 3'UTR in translational control (Jackson, 1993). The 
demonstration that the 3'UTR of certain transcripts can 
control mRNA localisation and polyadenylation provides 
a mechanism for translational regulation (Huarte et al, 
1992; Gavis and Lehmann, 1994). In addition, specific 
sequences within * 3 'UTRs have been shown to repress 
translation (Goodwin et al t 1993; Evans - et cl % 1994; 
Kw6n and Hecht, 1993). RNA-protein interactions are 
likely to be involved in 3'UTR-dependent translational 
repression. Indeed, a protein that binds specifically to the 
3'UTR of protamine 2 mRNA and represses its translation 
has been identified (Kwon and Hecht, 1993). If the p53 
3'UTR can be .shown to regulate p53 mRNA translation 
in vivo, it is possible that franr-acting factors (missing or 
inactive in reticulocyte lysates) activate components of the 
translational machinery to bypass this negative regulatdfy 
domain on human pS3 mRNA. Such frww-acting factors 
could interact directly with p53 mRNA to enhance its rate 
of translation. Alternatively, fra/w-acting factors directed 
to the p53 3'UTR (that are also present in reticulocyte 
lysates) may act as repressors of translation. Differences 
in the level of p53 protein synthesis among AML blasts 
and possibly other human cells could, therefore, be deter- 
mined by differences in the level or activity of these 
regulatory molecules. 
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The 0CI/AML-3 and OCl/AML-4 cell lines were derived from primary 
blasts of two AML patients (Wang <i aL 1989). The OCI-M2 cell line 
was derived from the primary bUsti of n patient whose crythroletikem a 
represented the end stage of a previously Identified myelodysplastlc 
syndrome (Papayannopoulou et aL 1988). OCI/AML-J ond OC1-M2 
cells were grown lo alpha-modified minimum essential medium (a-MEM) 
containing 10* fetal calf serum IFCS! (GIBCO). The GCl/AML-4 cells 
were grown in <x-MEM containing 10* FCS and 10* conditioned 
medium obtained from the human bladder carcinoma cell line 5637 
(5637-CM) (Wang el at. 1989). The AML blast cells were obtained 
directly from AML patients. The mononuclear cell fraction of peripheral 
blood was collected after separation through Fieoll-Hypaque (Pharmacia) 
(1.077 g/ml) and T-lymphocytc depletion (Minden et at. 1979). These 
cells were stored frozen in liquid nitrogen before use. 

Metabolic labelling end Immunoprectpitation 
The blast celts of AML patients were lhawed and incubated for 2 days 
at 37*C in a- MEM containing 10* FCS and 10* 5637-CM before 
metabolic lubclting. 1X10' cells were labelled with 0.2 mCi |-S|- 
methionine (DuPonl NEN Rescurch Products) In 0.5 ml a-MEM lacking 
methionine and containing 10* dialyscd FCS at 37"C for 15 min. Cells 
were then Immediately pelleted, the radioactive medium removed, and 
the cells ly'scd on ice in a solution containing 25 mM Tris pH 7.4. 
50 mM NaCI. 0.5* sodium deoxycholate. 2* NP4Q, 0.2* SOS. 0.5 mM 
phtnylmeihylsutfonyl fluoride (PMSF). I H^ntl leupeptln and I ugAnl 
aprotinin for 20 min. Lysates were cleared by centrifugation. the 
supernatant was retained and incubated with 5 ng of a non-specific 
Ig02a mouse monoclonal antibody (Sigma) for 60 min on ice. These 
were then reacted with 0.5 ml of a 10* suspension of formalin-treated 
St<ii>hvtocacctts aureus Cowan i cells (Pansorbin. Calbiodiem-Behring ) 
for 30 min on ice. followed by centrifugation ond retention of the 
supernatant. Portions of precleared lysates contulning equal numbers of 
trichloroacetic add -Insoluble counts ( 10 7 e.p.m.) were diluted in NET/ 
GEL buffer ( 150 mM NaCI. 5 mM EDTA pH 8.0. 50 mM Tris pH 7.4. 
0.05* NP40. 0.02* sodium aiide. 0.25* gelatin) and immunopreclpit- 
ated on ice for 2 h with PAW21 monoclonal antibodies against p53 
protein or control PAb4l9 antibodies (Harlow et ttL 198 1 ). The immune 
complexes were collected on 60 \A prewashed protein A-Sepharose 
beads (Pharmacia), washed three limes with NET/O EL buffer, and eluted 
(mo 30 Ml protein sample buffer (2* SDS. 10* glycerol. 0.1* 
broraophenol blue, 25 mM Tris pH 6.8. 0.1 M dithiothrettol) by boiling 
for 10 min. The Sephamse beads were removed by centrifugation. the 
samples wens loadM on a 10* polyacrylamlde gel containing SDS and 
proteins were resolved by electrophoresis at 45 mA. Gels were fixed in 
7.5* acetic acid and 25* methanol for 30 min before drying and 
exposure to X-ray film (DuPortt NEN Research Products). 

Western blot analysts 

5 X I0 6 cells were lyscd directly in an equal volume of 2X protein sample 
bufTer. The extracts were passed through a 2 1 -gauge needle several 
times to reduce viscosity and boiled for 10 min before electrophoresis 
at 45 mA on a 10* polyacrylamlde gel containing SDS. Resolved 
proteins were transferred to a nitrocellulose membrane (Schleicher 

6 Schuetl). and the abundance of p53 protein was estimated by 
immunoblotllng with a human p53-specific monoclonal antibody 
PAbl801 (Banks W at, 1986). Bound antibody was detected using the 
enhanced chemllurrilnescencc detection system (DuPonl NEN Research 
Products) according to the manufacturer's instructions. 

Northern blot analysis 

Total cellular RNA was isolated using the guanidinium thlocyanate- 
cesium chloride method (Chtrgwln et aL 1979). 20 \Lg of total RNA 
was separated by electrophoresis on a 1% agarose gel containing 6* 
formaldehyde and transferred to a nitrocellulose membrane (Schleicher 
& Schuetl). The blots were hybridlied with cDNA probes labelled with. 
[ 5i P|dCTP in a random priming reaction (Felnberg and Vagelstcin. 
1983). washed and exposed to X-ray film. The amount of RNA 
was determined with a Molecutar Dynamics Phosphorlmager using 
MulUquant software. The human p53 probe was the Xbal-EcoRl fragmeni 
or p53 cDNA from the P R4-2 plasmld (Harlow et aL, 1985): the U5 
probe was the f>s(i-Datt\H\ fragment from the human ribosomaJ protein 
L35 cDNA (Heaog et «L 1990): the GAPDH probe was a I J kb Ptil 
fragmeni of rat GAPDH cDNA (Fori et aL 1985); the I8S ribosomal 



gene (Torctymki et of., 1985). ' 
Genomic DNA preparation 

Genomic DNA from OCl/AML-3 and OCl/AML-4 cell Una Was isolated 
following a modification of the procedure. described by Kuplee et «f. 
(1987). JxtO 7 cells were washed with Ice-cold PBS buffer, resuspciuJcd 
in 3 ml of lysis buffer (20 mM EDTA pH 8.0, 100 |ig/ml proteinase K. 
0.5* sarkosyl) and Incubated at 50'C for 3 h. DNA was exir&cied 
with phenol/chloroform, dialyscd against 50 mM Tris-HCI pH 8.0. 
10 mM EDTA. 10 mM NaCI at 4*C and then treated with RNase A 
(100 pg/mll «i 37 *C for 3 h. DNA vz% again extracted with phemtu" 
chloroform and dialyscd against 10 mM Tris pH 7.4, I mM EDTA. 
DNA concentration was determined by measuring the ibsotbancc at 
260 nin. 

Amplification of p53 sequences from RNA And DNA 
20 )t| of total RNA was precipitated with ethanol and resuspended in a 
30 ul reaction containing 300 tig of oligo(dT) primer (Amerxham 
International). 50 mM Tris-HCI pH 8.3. 77 mM KCI. 3 mM MgCl ; . 
3 mM dilhiothreltol, 3 mM dNTP. 30 units of RNAguard (Pharmacia) 
and 200 units of Moloney murine leukemia virus reverse transcriptase 
(GIBCO'BRL) and incubated at 42*C for 60 min. The first strand cDNA 
was (her. used as the template for amplification by PCR using Taq 
polymerase (Promega). PCR amplification was performed with 10 u.1 of 
each first stand cDNA as the template and 40 cycles of denaturatlon 
(94*C, I min). annealing (64 *C 30 si, and elongation (72*C, I min). 
The following p53 -specific primers were used for amplifying the co mplete 
coding region and the 3'UTR: 5'SXl f sense, exon I. OACACTTT- 
GCGTTCOOQCTGOGAO). 5*SX5A tsense, exon 5. GAOCGCTGCT- 
C AG ATAGCG ATO ). 3'SXll (sense, exon II. OAAOGOCCTQACT- 
CAGACTGAC). 3'AX-6 (antisense, exon 6, AOATGCTGAGOAGGG- 
GCCAOAC). JS-3 (antisense. exon 1 1. GAGGOAOAGATGGOOGT- 
GGGAOGCTGTC) and AS-4 (antisense, exoo 1 1. GOCAOCAAAGT- 
TTTATTGTAAAATAAG). The 5'UTR and sequences further upstream 
were amplified from I \ig genomic DNA using the following pair of 
p53*spccific primers: 5'UTR* I (sense, promoter region, ACCTAA- 
GCTTOTC ATCKKTG ACTGTCC AGCTTTG ) and p-EX (antisense. exon 
I. CCAATCCAOGOAAGCGTGTCACCG). 

Direct sequencing of double-stranded PCR products 
Double-stranded DNA fragments produced by PCR amplification were 
e luted from agarose gels and purified by extraction with phenol/ 
chloroform. 200 ng of purified PCR product were mixed with human 
p53-spectfic oligonucleotides as sequencing primers, froien in dry ice. 
dried In a centrifugal evaporator (Savant SpeedVae), redissolved in 
sequencing buffer (40 mM Tris-HCI pH 7.5, 25 mM MfiCli. 50 mM 
NaCI. 10* DMSO) and subjected to the sequencing reaction as described 
by WinsKlp (1989). 

RNase protection assay 

Plasmtd p729 was constructed from three DNA fragments lo two stages. 
A 330 bp DNA fragment derived from the human p53 gene promoter 
was excised from the p2E-H2BX pUsmid (Lamb and Crawford, 1986) 
with f/iiidW and Xbal and inserted into the pOEM-4 plasmid (Promega) 
between the Hindltl and Xbal sites. In the second stage, a fragment 
corresponding to the 5' end of p53 mRNA was obtained by RT-PCR 
using p53 mRNA prepared from OCVAML-3 cells and the p53-specific 
primers 5'UTR -3 (sense, exon I. (XOGAAOCTTCAAAAOJCEAj. 
OAOCCACCGTCCAO) and 5*AX4 (antisense, exon 4, OGTGTAOO- 
ACKrrOCTGCTGGTGC). The resulting fragment was end-filled with 
the Klenow fragmeni of DNA polymerase 1, digested with Xbal at the 
site present in the 5'UTRO primer shown underlined and inserted 
between the Xbal and Steal sites present in the plasmld generated In the 
first stage. 

p729 was linearized with HimHil and a 729 nucleotide antisense probe 
was prepared by transcription with SP6 RNA polymerase. The. in Wrm 
transcription reaction rhuiure contained 50 mM Tris-HCI pH 8.0, 10 mM 
MgCU 4 mM spermidine, 10 mM NaCI, 0.S mM each' or ATP, GTP. 
CTP, "(2 HM UTP. 5 nO l J1 P|UTP. 10 mM dilhiothreltol. 20 units of 
RNAguard, 0.5 me of lineiriwd template and 10 units of SP6 RNA 
polymerase in a final volume of 20 ul. After mcubalton at.37*C for 
60 min. the DNA template was digested with DNase I and the RNA 
probe was extracted with phenol/chloroform, precipitated with ethanol 
and resuspended in water. This RNA probe covered the entire p53 gene 
promoter region and included the first three exons and a part of the 
fourth exon. p53 transcripts initiating from one of the major start sites 
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should yield protected fragments of -385 nucleotides. p53 transcripts 
originating from the most 5' of the minor start sites should yield 
protected fragments of 449 nucleotides (Tuck and Crawford. 1989). 

In the RNasc protection assay, 30 ng of totut RNA was mixed with 
I X 10* cp.m. of the labelled probe and precipitated with ethanol. The 
RNA/probe mixtures were then washed, dried and reiuspended in 10 pi 
of hybridization solution (Winter rt al. (985). heated to S0*C for 10 
min, and hybrid! ied at 46*C overnight. After hybridization, the samples 
were mixed with 0-IH ml of RNase digestion mix containing 60 Mg/ml 
of RNase A (type III. Sigma). 1100 U/ml of RNase Tl (Boehringer 
Mannheim) in .W0 mM NaCI. 5 mM EOTA. 10 mM Tris-HCl pH 7.5. 
After. incubation at 37*C for 60 min. the digestion was terminated by 
addition of 10 |il of 20* SDS and 5 pi of proteinase K (10 mg/ml) 
(Boehringer Mannheim) and incubation al 37 # C for 15 min. Protected 
fragments were extracted with phenol/chloroform", prceipinied with 
ethanol. resolved by denaturing gel electrophoresis and visualized by 
autoradiography. 

Polysome analysis 

5Xl() ? cells were washed once in ice-cold Tris-saline solution (25 niM 
Tris-HCl pH 7.5. 25 mM NuCD containing 10 mM MgClj and 10 pg/inl 
cyelohexlmide. The cells were then immediately lysed on ice with the 
use of ft Dounce homogenher in 2 ml homoge nidation buffer containing 
25 mM Tris-HCl pH 7.5. 15 mM NaCI. 10 mM MgCK 19c Triton 
X-100. 3-10 U/ml heparin (LEO Laboratories Canada Ltd). 2 mM 
vunadyt ribonudeoside complex (Sigma). 2.5 mM PMSF. 10 ug/nM 
cyctoheximide. \ mM dithiothreitot and I mM ECTA. The extract was 
centriluged'ut H 000 r.p.m. for 6 min at 4*C to remove cell debris, the 
supernatant was collected and layered over a 15-50* linear sucrose 
gradient (II mil prepared in homogenliation buffer. The gradients were ■ 
centrifuged in an SWJt Beekman rotor ai I75O00 s for 110 min at 
•TC. Ten fractions of equol volume wick collected from the bottom of 
the tubes. RNA was prepared from each or the fractions by phenol/ 
chloroform extraction and ethanol precipitation and resuspended in 
200 M> DEPC-ireated water. The amount of p53 raRNA in each fraction 
(100 nl of the RNA sample) was determined by dot-blot hybridiiution 
analysis using a >-P-tabelled human p53 cDNA probe. Polysomes used 
to calibrate the gradients were prepared in exactly the same way except 
for nnoddiilonal purification step involving precipitation of the polysomes 
present in the homogenaic wiili 100 mM MgCl ; for t h on ice before 
sucrose gradient sedimentation, For calibration. OJ-ml fnictitms were 
collected from the bottom of the gradient end Aim of each fraction was 
determined. 

<■ 

Templates for In vitro transcription and translation 
Plasmid p25!6 contains nearly full-length human wild-type p53 cDNA 
and was constructed by the correct ligation of three cDNA fragments. 
One fragment corresponding to the 5' end of the p53 transcript was 
- obtained from pR4-2 (Harlow el ttL 1985) after digestion with Xbal 
and Pi*U which cut in exons I and 5. respectively. The middle fragment 
was obtained from pProSp53 (Matlashewskl rt al. 1987) after digestion 
with PvuW and BamHl which cat in exons 5 and U. respectively. The 
third fragment corresponding to the 3' end of the P 53 transcript was 
obtained by RT-PCR amplification of the 3'UTR of p53 rnRNA using 
p53-spccific oligonucleotides as primers. 3'SXl3 (sense, exon II. 
GTCACCCCATCCCACACCCTGO) and AS-4. The PCR-amplified 
fragment was end-filled with the Kienow fragment of DNA polymerase 
I and digested at an internal fiwriHt site. These three fragments which 
represent contiguous sequences or the native p53 transcript were inserted 
between die Xb<tl and Smal sites of a modi lied form of the pSPW vector 
(Promega) In which polylinker sequences between the fY/rtdlll site. and 
(he Xba\ site were deleted. The resulting plasmid is referred to as 
p25(6 and yields a p53 transcript In W/m starting with the sequence 

5'0AATACAA0d£3A2A y - W/m J neariv 

identical to p53 transcripts originating from the most 3 of the major 
transcription initiation sites la viixr which start with 5'CAAAAOIdA^ 
OA..,, J' (Tuck and Crawford. 1989). The beginning of identity corres- 
ponding to an Xlw\ site in the cDNA Is underlined. Digestion of P 2516 
with BcoM provides a template that can produce a synthetic full-length 
p53 transcript of 2516 nucleotides. Digestion with BawW provides a 
template for a truncated p53 transcript of 2034 nucleotides that Is missing 
sequences from the 3'UTR containing the Alu-like element. 

The plasmid pCAT-Alu was constructed in two steps. First, the 
chloramphenicol acetyltransferusc gene was excised from. the CAT 
plasmid (Fu el at., 1991) with //fodlll and flnmHL and Inserted Into 
pSP64 to generate pSP6CAT. Second, the BamH\-EniRl fragment from 
p25!6 that contains the Alu-like element present In the p53 was 



inserted immediately downstream of the CAT gene. The plasmid pCAT- 
BS was constructed by removing the Smol-ffamHt fragment of the p53 
3*UTR present In p25 16 and Inserting this fragment In reverse orientation 
into pSP6CAT Immediately downstream of CAT. This Smol-BanMi 
fragment Is missing the AUi-Uke element present at the distal end or the 
P 53 3'UTR. 

In vitro transcription and In vitro potytdenylation 
Plasmid DNAs containing templates for fit «/m transcription were 
lineariied at selected restriction endonuclease sites. Standard transcription 
assays (Mellon rt al. 1984) were performed as described above for the 
preparation of nntisense RNA probes with the omission of l"P|UTP. 
0.5 mM T "'G(5')ppp(5')0 and 0.05 mM OTP were included in the 
reactions to provide efficient capping at the 5 1 end of synthetic transcripts, 
polyadenylation reactions contained synthetic RNA. 0.2 mM ATP. SO mM 
Tris-HCl pH 8.0. 10 mM MgClj. 250 mM NaCl, 2 mM MnCij. 2 mM 
diihiothreitol. I unit/nl RNAguard (Pharmacia). 500 Mg/ml of BSA 
(Pharmacia) and 5 units .of po!y( A) polymerase (Pharmacia) in a 50 Ml 
final volume (McGrew</n/.. 1989). After 30 min at 37*C. polyadcnylaied 
RNAs were purified by phenol/chloroform extraction and ethanol pre- 
cipitation. 

In vitro translation end tmmunopreclpltatlon 
Synthetic transcripts were translated in micrococcat-nuclease-i rested 
rabbit reticulocyte lysates (Promega) under the conditions recommended 
by the supplier. Reactions containing p53 transcripts were incubated for 
30 min at.30'C in the presence of l M $]methionine and stopped by 
addition of diihbihreltal to a final concentration of I mM and EDTA 
pH 8.0 io a final concentration of (0 mM. Each reaction was then 
divided into two aliquots. one for immunoprecipitation with the p53- 
specific monoclonal antibody PAb42l and the other for Immunoprecipit- 
ation with a control antibody PAMI9. Reactions containing CAT or 
lueifernse transcripts were incubated for 30 min at 30*C in the presence 
of ( ?5 S|methionine und were slopped by oddition of protein sample buffer, 
boiled for 5 min and resolved by polyacrylamidc gel electrophoresis. 
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Expression of cytochrome P4502E1 in human liver: assessment by 
mRNA, genotype and phenotype. 

Powell H, Kitteringham NR, Pirmohamed M, Smith DA, Park BK. 

Department of Pharmacology and Therapeutics, The University of Liverpool, UK. 

Cytochrome P4502E1 (CYP2E1) is constitutively expressed in human liver and is 
responsible for the metabolic bioactivation of a wide variety of xenobiotics, 
including a number of protoxins and procarcinogens. CYP2E1 expression is 
regulated at several levels including pre-transcriptional, transcriptional and 
postoranscriptional levels, and any variation in enzyme concentration and hence 
activity may represent increased risk of toxicity or carcinogenicity. We have 
investigated variability in the levels of CYP2E1 mRNA, protein and functional 
activity in a human liver bank, and attempted to relate these parameters to the 
Rsai restriction fragment length polymorphism in the 5 '-flanking region. Variation 
in CYP2E1 mRNA (18-fold) was greater than the variation seen in CYP2E1 
protein (twofold) and functional activity (fourfold) determined using two probe 
substrates, chlorzoxazone and p-nitrophenol. Although protein and functional 
activity showed a significant correlation (r = 0.93 and r = 0.83 for chlorzoxazone 
and p-nitrophenol, respectively), there was no correlation between any of these 
parameters and mRNA levels, Also> the variation in CYP2E1 activity could not be 
directly accounted for by the Rsai polymorphism in our samples. In conclusion, 
our results are consistent with a complex regulation of CYP2E1 and the fact that it 
is highly conserved in the human population. The absence of a relationship 
between the Rsai polymorphism and CYP2E1 activity is consistent with other 
studies performed in Caucasians, but does not exclude an effect of this 
polymorphism on inducibility of CYP2E 1 . 
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Evidence of tissue-specific, post-transcriptional reguiation of 
NRF-2 expression. 

Vallejo CG, Escriva H, Rodriguez-Pen a A. 

Instituto de Investigaciones Biomedicas 'Alberto Sols' (CSIC-UAM), Arturo 
Duperier, 4-, 28.029, Madrid, Spain, cvallejo@iib.uam.es 

Mitochondrial respiratory function requires the expression of genes both from the 
mitochondrial and nuclear genomes. Nuclear respiratory factor 2 (NRF-2) is a 
transcription factor required for the expression of several nuclear-encoded 
mitochondrial proteins, including the specific mitochondrial transcription factor 
Tfam. This makes NRF-2 a likely candidate to coordinate expression of 
mitochondrial components. NRF-2 is a multisubunit complex of which the alpha 
subunit binds DNA and the beta subunit enhances this binding, respectively. We 
have analysed in vivo the expression patterns of NRF-2 sub units both at the 
mRNA and protein level, in three rat tissues, liver, testis and brain. In contrast 
with Tfam or the 'housekeeping' beta-actin expressions in which a parallel 
gradient was observed, no correlation was found between NRF-2 mRNAs and 
proteins levels, thus suggesting posUranscriptional regulation. 

PMID: 1 1 120355 [PubMed - indexed for MEDLINE] 
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An examination of the effects of hypoxia, acidosis, and glucose 
starvation on the expression of metastasis-associated genes in 
murine tumor cells. 

Jang A, Hill RP. 

Ontario Cancer Institute, and Department of Medical Biophysics, University of 
Toronto, Canada. 

Tumor cells exposed to a growth stress such as low pH, glucose starvation and 
hypoxia have been shown to exhibit a transient increase in experimental 
metastatic potential, particularly when allowed to recover under normal growth 
conditions for a period of 24-48 h. In this study we examined whether this 
increase in metastatic ability could be explained by changes in the expression of a 
number of different metastasis-associated genes, when the cells were exposed to 
similar conditions (24-48 h exposure to the stress condition followed by .0-48 h 
recovery under normal growth conditions). Although the cell lines used (KHT 
fibrosarcoma, SCC VH squamous cell carcinoma, and B16F1 melanoma) 
demonstrated altered metastatic ability after the treatment, no overall temporal 
correlation between changes in the mRNA levels for cathepsin B, cathepsin L, 
nm23, TJMP-1, osteopontin, or VEGF and metastatic ability in the three cell lines 
was observed. The production of gelatinase A (72 kDa collagenase) and gelatinase 
B (92 kDa collagenase) was also measured by gelatin zymography. There was an 
increase in production of these enzymes with increasing recovery time, but it did 
not parallel changes in metastatic potential. Although these results suggest that the 
products of most of the genes studied may not be involved in the transient 
metastatic changes, further studies are required to establish whether changes in 
protein levels, track with changes in mRNA levels for these genes. 

PME>: 9247250 [PubMed - indexed for MEDLINE] 
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WISP genes are members of the connective tissue growth factor 
family that are up-regulated in Wnt-l-transformed cells and 
aberrantly expressed in human colon tumors 
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ABSTRACT Wnt family members are critical to many 
developmental processes, and components of the Wnt signal- 
ing pathway hare been linked to tumorlgcucsls In familial and 
sporadic colon carcinomas. Here wo report the identification 
of two genes, WISP-1 and WISP-2, that ore up-regulated in the 
mouse mammary epithelial cell line C57MG transformed by 
Wnt-l, bat not by Wnt-4. Together with a third related gone, 
WISP-3 t these proteins define a subfamily of the connective 
tissue growth factor family. Two distinct systems demon- 
strated WISP induction to be associated with the expression of 
Wnt-JL These included (i) C57MG cells Infected with a Wnt-l 
retroviral vector or expressing Wnt-l under the control of a 
tetracyllne repressible promoter, and (if) Wnt-l transgenic 
mice. The WISP-1 gene was localized to human chromosome 
8q24.1-8q24.3. WISP-1 genomic DNA was amplified In colon 
cancer cell lines and in human colon tumors and its RNA 
overexpressed (2- to > 30* fold) In 84% of the tumors examined 
compared with patient-matched normal mucosa. W1SP-3 
mapped to chromosome 6q22-6q23 and also was overex- 
pressed (4- to > 40-fold) In 63% of the colon tumors analyzed. 
In contrast, WISP-2 mapped to human chromosome 20ql2- 
20ql3 and Its DNA. was amplified, but UNA expression was 
reduced (2- to >30-fold) in 79% of the tumors. These results 
suggest that the WISP genes may be downstream of Wnt-l 
signaling and that aberrant levels of WISP expression in colon 
cancer may play a role In colon tumorlgenesls. 

Wnt-l is a member of an expanding family of cystelne-ricli, 
glycosylated signaling proteins that mediate diverse develop- 
mental processes such as the control of cell proliferation, 
adhesion, cell polarity, and the establishment of cell fates (1, 
2). Wnt-l originally was identified as an oncogene activated by 
the insertion of mouse mammary tumor virus in virus-induced 
mammary adenocarcinomas (3, 4). Although Wnt-l is not 
expressed in the normal mammary gland, expression of Wnt-l 
in transgenic mice causes mammary tumors (5). 

In mammalian cells, Wnt family members initiate signaling 
by binding to the seven-transmembrane spanning Frizzled 
receptors and recruiting the cytoplasmic protein Dishevelled 
(Dsn) to the cell membrane (1, 2, 6). Dsn then Inhibits the 
kinase activity of the normally constitutively active glycogen 
synthase kinase-3B (GSK-3j3) resulting in an increase in 
j3-catcnin levels. Stabilized fl-catenin interacts with the tran- 
scription factor TCF/Lefl, forming a complex that appears in 
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the nucleus and binds TCF/Lefl target DNA elements to 
activate transcription (7, 8). Other experiments suggest that 
the adenomatous polyposis coli (APC) tumor suppressor gene 
also plays an important role in Wnt signaling by regulating 
fl-catenin levels (9). APC is phosphoryiated by GSK-3A, binds 
to 0-catenin, ana facilitates its degradation. Mutations in 
either APC or fl-catenin have been associated with colon 
carcinomas and melanomas, suggesting these mutations con- 
tribute to the development of these types of cancer, implicating 
the Wnt pathway in tumorigenesis (1). 

Although much has been learned about the Wnt signaling 
pathway over the past several years, only a few of the tran- 
scriptionally activated downstream components activated by 
Wnt have been characterized Those that have been described 
cannot account for all of the diverse functions attributed to 
Wnt signaling. Among the candidate Wnt target genes are 
those encoding the nodal-related 3 gene, Xnr3 t a member of 
the transforming growth factor (TGF)-fl superfarnily, and the 
homeobox genes, cngrailed t goosecoid, twin (Xtwn), and stamois 
(2). A recent report also identifies c-myc as a target gene of the 
Wnt signaling pathway (10). 

To identify additional downstream genes in the Wnt signal- 
ing pathway that are relevant to the transformed cell pheno- 
type, we used a PCR-based cDNA subtraction strategy, sup- 
pression subtractive hybridization (SSH) (11), using RNA 
isolated from C57MG mouse mammary epithelial cells and 
C57MG cells stably transformed by a Wnt-l retrovirus. Over- 
expression of Wnt-l in this cell line is sufficient to induce a 
partially transformed phenotype, characterized by elongated 
and refractile cells that lose contact inhibition and form a 
multilayered array (12, 13). We reasoned that genes differen- 

n expressed between these two cell lines might contribute 
e transformed phenotype. 
In this paper, we describe the cloning and characterization 
of two genes up-regulated in Wnt-l transformed cells, WISP-1 
and W1SP-2, and a third related gene, WISPS, The WISP genes 
are members of the CCN family of growth factors, which 
includes connective tissue growth factor (CTGF), C^r61, and 
7iov, a family not previously linked to Wnt signaling. 

MATERIALS AND METHODS 

SSH. SSH was performed by using the PCR -Select cDNA 
Subtraction Kit (CLONTECH). Tester double-stranded 

Abbreviations: TGF, transforming growth factor, CTGF, connective 
tissue growth factor; SSH, suppression subtractive hybridization; 
VWC, von Willebrond factor type C module. 
Data deposition: The sequences reported In thb paper have been 
deposited in the Oenbank database (accession noa. AF100777, 
AF100778, AF100779, AF1Q0780. and AF100781). 
To whom reprint requests should be addressed, e-mail: dlane@gene. 
com. 
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cDNA was synthesized from 2 p.g of poly(A) + RNA Isolated 
from the CS7MG/Wnt-1 celt line and driver cDNA from 2 fig 
of poly(A) + RNA from the parent C57MG cells. The sub- 
tracted cDNA library was subcloned into a pGEM-T vector for 
further analysis. 

cDNA library Screening. Clones encoding full-length 
mouse WISP-l were isolated by screening o AgtlO mouse 
embryo cDNA library (O-ONTEOH) with a 70-bp probe from 
the original partial clone 568 sequence corresponding to amino 
acids 128-169. Clones encoding full-length human WlSP-l 
were isolated by screening AgtlO lung and fetal kidney cDNA 
libraries with the same probe at low stringency. Clones en- 
coding full-length mouse and human WJSP-2 Were isolated by 
screening a C57MG/Wnt-1 or human fetal lung cDNA library 
with a probe corresponding to nucleotides 1463-1512. Full- 
length cDNAs encoding WISPS were cloned from human 
bone marrow and fete! kidney libraries. 

Expression or Human WISP RNA. PGR amplification of 
first-strand cDNA was performed with human Multiple Tissue 
cDNA panels (CLONTECH) and 300 juM of each dNTP at 
94°C for 1 sec, 62 fl C for 30 sec, 72°C for 1 min, for 22-32 cycles. 
WISP and glyceraldehyde-3-phosphate dehydrogenase primer 
sequences arc available on request 

In Situ Hybridization. "P-labeled sense and antisense ribo- 
probes were transcribed from an 897-bp PCR product corre- 
sponding to nucleotides 601-1440 of mouse WISP-l or a 
294-bp PCR product corresponding to nucleotides 82-375 of 
mouse WISP-2. All tissues were processed as described (40). 

Radiation Hybrid Mapping. Genomic DNA from each 
hybrid in the Stanford G3 and Genebridge4 Radiation Hybrid 
Panels (Research Genetics, Huntsville, AL) and human and 
hamster control DNAs were PCR-amplifled, and the results 
were submitted to the Stanford or Massachusetts Institute of 
Technology web servers. 

Cell Lines, Tumors, and Mucosa Specimens. Tissue speci- 
mens were obtained from the Department of Pathology (Uni- 
versity of Pittsburgh) for patients undergoing colon resection 
and from the University of Leeds, United Kingdom. Genomic 
DNA was isolated (Qiagen) from the pooled blood of 10 
normal human donors, surgical specimens, and the following 
ATCC human cell lines: SW480, COLO 320DM, HT-29, 
WiDr, and SW403 (colon adenocarcinomas), SW620 (lymph 
node metastasis, colon adenocarcinoma), HCT 116 (colon 
carcinoma), SK-CQ-1 (colon adenocarcinoma, ascites), and 
HM7 (a variant of ATCC colon adenocarcinoma cell line LS 
174T). DNA concentration was determined by "sfag Hoechst 
dye 33258 intercalation f luorimetry. Total RNA was prepared 
by homogenization m 7 M GuSCN followed by centrifugation 
over CsQ cushions or prepared by using RNAzol. 

Gene Amplification and RNA Expression Analysis. Relative 
gene amplification and RNA expression of WISP& and c-myc in 
the cell lines, colorectal tumors, and normal mucosa were 
determined by quantitative PCR. Gene-specific primers and 
fluorogenic probes (sequences available on request) were 
designed and used to amplify and quantitate the genes. The 
relative gene copy number was derived by using the formula 
2(icO wnerc represents the difference in amplification 
cycles required to detect the WISP genes in peripheral blood 
Lymphocyte DNA compared with colon tumor DNA or colon 
tumor RNA compared with normal mucosal RNA* The 
3-method was used for calculation of the SE of the gene copy 
number or RNA expression level. The WTSP-spedfic signal was 
normalized to that of the glyceraldehyde-3-phosphatc dehy- 
drogenase housekeeping gene. All TaqMan assay reagents 
were obtained from Perkin-EImer Applied Biosystems. 

RESULTS 

Isolation of WISP*! and by SSH. To identify Wnt- 

1 -inducible genes, we used the technique of SSH using the 



mouse mammary epithelial cell line C57MG and C57MG cells 
that stably express Wnt-1 (11). Candidate differentially ex- 
pressed cDNAs (1,384 total) were sequenced. Thirty-nine 
percent of the sequences matched known genes or horrto- 
logues, 32% matched expressed sequence tags, and 29% had 
no match. To confirm that the transcript was differentially 
expressed, semiquantitative reverse transcription-PCR and 
Northern analysis were performed by using mRNA from the 
C57MG and C57MG/Wnt-1 cells. 

Two of the cDNAs, WISP-l and WISP-2, were differentially 
expressed, being induced in the C57MG/Wnt-1 cell line, but 
not in the parent C57MG cells or C57MG cells overexpressing 
Wnt-4 (Fig. 1 A and B). Wnt-4, unlike Wnt-1, does not induce 
the morphological transformation of C57MG cells and has no 
effect on j3-catenin levels (13, 14). Expression of WISP-l was 
up-regulated approximate^ 3-fold in the C57MG/Wnt-1 cell 
line and WISP-2 by approximately 5-fold by bodi Northern 
analysis and reverse transcription-PCR. 

An independent, but similar, system was used to examine 
WISP expression after Wnt-1 induction. C57MG cells express- 
ing the Wnt-1 gene under the control of a tetracycline- 
repressible promoter produce low amounts of Wnt-1 in the 
repressed state but show a strong induction of Wnt-1 mRNA 
and protein within 24 hr after tetracycline removal (8). The 
levels of Wnt-1 and WISP RNA isolated from these cells at 
various" times after tetracycline removal were assessed by 
quantitative PCR. Strong induction of Wnt-1 mRNA was seen 
as early as 10 hr after tetracycline removal. Induction of WISP 
mRNA (2- to 6-fold) was seen at 48 and 72 hr (data not shown). 
These data support our previous observations that show that 
WISP induction is correlated with Wnt-1 expression. Because 
the induction is slow, occurring after approximately 48 hr, the 
induction of WISP& may be an indirect response to Wnt-1 
signaling. 

cDNA clones of human WISP-l were isolated and the 
sequence compared with mouse WISP-l . The cDNA sequences 
of mouse and human WISP-l were 1,766 and 2,830 bp in length, 
respectively, and encode proteins of 367 aa, with predicted 
relative molecular masses of -40,000 (Ar* r 40 K). Both have 
hydrophobic N-terminal signal sequences, 38 conserved cys- 
teine residues, and four potential N-linked glycosylation sites 
and arc 84% identical (Fig. 2/1). 

Full-length cDNA clones of mouse and human WISP-2 were 
1,734 and 1,293 bp in length, respectively, and encode proteins 
of 251 and 250 aa, respectively, with predicted relative molec- 
ular masses of ~27,000 (M, 27 K) (Fig. 25). Mouse and human 
WISP-2 arc 73% identical Human WISP-2 has no potential 
N-Iinked glycosylation sites, and mouse WISP-2 has one at 



CS7MG 




. Flo. 1. WISP-l and WISP-2 are Induced by Wnt-1, but not Wnt-4, 
expression in C57MO cells. Northern analysis of WISP-l (4) and 
WISP-2 (B) expression In C57MG, C57MG/Wat-1, and C57MG/ 
WnM cells. Poly(A) + RNA (2 jig) was subjected to Northern blot 
onatysis and hybridized with a TGAsp mouse *WSP-J-ipecific probe 
(amino acids 278-300) or a 190-bp BW-2-specific probe (nucleotides 
1438-1627) in the 3' untranslated region. Blots were rehybridized with 
human p-actin probe. 
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Fig. 2. Encoded amino acid sequence alignment of mouse and 
human WtSP-1 (A) and mouse and human WISP-2 (B). The potential 
signal sequence, iiurulin-likc growth factor-binding protein (IOF-BP), 
VWC, thromboipondin (TSP), and C- terminal (CT) domains are 
underlined 

position 197. WISP-2 has 28 cysteine residues that are con- 
served among die 38 cysteines found in WISP-1. 

Identification of WISP-3. To search for related proteins, we 
screened expressed sequence tag (EST) databases with the 
WISP-1 protein sequence and Identified several ESTs as 
potentially related sequences. We identified a homologous 
protein that we have called WISP-3. A full-length human 
WISP-3 cDNA of 1,371 bp was isolated corresponding to those 
ESTs that encode a 354-aa protein with a predicted molecular 
mass of 39,293. WISP-3 has two potential N-linkcd gh/cosyl- 
ation sites and 36 cysteine residues. An alignment of the three 
human WISP proteins shows that WISP-1 and WISP-3 are the 
most similar (42% identity), whereas WISP-2 has 37% identity 
with WISP-1 and 32% identity -with WISP-3 (Fig. 3A). 

WTSPs Are Homologous to the CTGF Family of Proteins. 
Human WISP-1, WISP-2, and WISPS are novel sequences; 
however, mouse WISP-1 is the same as the recently identified 
Elml gene. Elml is expressed in low, but not high, metastatic 
mouse melanoma cells, and suppresses the in vivo growth and 
metastatic potential of K-1735 mouse melanoma cells (15). 
Human and mouse WISP -2 are homologous to the recently 
described rat gene, rCop-1 (16). Significant homology (36- 
44%) was seen to the CCN family of growth factors, This family 
includes three members, CTGF, Cyr61, and the protoonco- 
gene nov. CTGF is a chemotactic and mitogenic factor for 
fibroblasts that is implicated in wound healing and fibrotic 
disorders and is induced by TGF-0 (17). Cyr61 is an extracel- 
lular matrix signaling molecule that promotes cell adhesion, 
proliferation, migration, angiogenesis, and tumor growth (18, 
19). nov (nephroblastoma overexpressed) is an immediate 
early gene associated with quiescence and found altered in 
Wilms tumors (20). The proteins of the CCN family share 
functional, but not sequence, similarity to WnM. All are 
secreted, cysteine-rich heparin binding glycoproteins that as- 
sociate with the cell surface and extracellular matrix, 

WISP proteins exhibit the modular architecture of the CCN 
family, characterized by four conserved cysteine-rich domains 
(Fig. 3B) (21). The N-terminal domain, which includes the first 
12 cysteine residues, contains a consensus sequence (GCGC- 
CXXC) conserved in most insulin-like growth factor (IGF)- 
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Fro. 3. (A) Encoded amino acid sequence alignment of' human 
WISPs. The cysteine residues of WlSF-1 and WISP-2 that are not 
present in WISP-3 are Indicated with "a dot. (B) Schematic represen- 
tation of the WISP proteins showing the domain structure and cysteine 
residues (vertical lines). The four cysteine residues in the VWC domain 
that are absent in WISP-3 are indicated with a dot (C) Expression of 
WISP mRNA In human tissues. PCR was performed on human 
multqjle- tissue cDNA panels (CLONTECH) from the indicated adult 
and fetal tissues. 

binding proteins (BP). This sequence is conserved in WISP-2 
and WISP-3, whereas WISP-1 has a glutamine in the third 
position instead of a glycine. CTGF recently has been shown 
to specifically bind IGF (22) and a truncated nov protein 
lacking the IGF-BP domain is oncogenic (23). The von WU- 
lebrand factor type C module (VWC), also found in certain 
coilagens and mucins, covers the next 10 cysteine residues, and 
is thought to participate in protein complex formation and 
oligomerization (24). The VWC domain of WISP-3 differs 
from all CCN family members described previously, in that it 
contains only six of the 10 cysteine residues (Fig. 3 A and B). 
A short variable region follows the VWC domain. The third 
module, the thrombospondin (TSP) domain is involved in 
binding to sulfated glycoconjugates and contains six cysteine 
residues and a conserved WSxCSxxCG motif first identified in 
thrombospondin (25). The C-terminal (CT) module contain- 
ing the remaining 10 cysteines is thought to be involved in 
dimerization and receptor binding (26). The CT domain is 
present in all CCN family members described to date. but is 
absent in WISP-2 (Fig. 3 A and B). The existence of a putative 
signal sequence and the absence of a transmembrane domain 
suggest that WISPs are secreted proteins, an observation 
supported by an analysis of their expression and secretion from 
mammalian cell and baculovirus cultures (data not shown). 

Expression of WISP mRNA In Human Tissues. Tissue- 
specific expression of human WISPs was characterized by PCR 
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analysis on adult and fetal multiple tissue cDNA panels. 
WISP-1 expression was 6een in the adult heart, kidney, lung, 
pancreas, placenta, ovary, small intestine, and spleen (Fig. 3C). 
Little or no expression was detected b the brain, liver, skeletal 
muscle, colon, peripheral blood leukocytes, prostate, testis, or 
thymus. WISP -2 had a more restricted tissue expression and 
was detected in adult skeletal muscle, colon, ovary, and fetal 
lung. Predominant expression of WISPS was seen in adult 
kidney and testis and fetal kidney. Lower levels of WISPS 
expression were detected in placenta, ovary, prostate, and 
small intestine. 

In Situ Localization or WISP-1 and WISP-2. Expression of 
WISP-1 and WISP-2 was assessed by in situ hybridization in 
mammary tumors from Wnt-1 transgenic mice. Strong expres- 
sion of WISP- 1 was observed in stromal fibroblasts lying within 
the fibrovascular tumor stroma (Fig. 4 A-D). However, low- 
level WISP-1 expression also was observed focally within tumor 
cells (data not shown). No expression was observed in normal 
breast Like WISP*1, WISP-2 expression also was seen in the 
tumor stroma in breast tumors from WnM transgenic animals 
(Fig. 4 E-H). However, WISP-2 expression in the stroma was 
in spindle-shaped cells adjacent to capillary vessels, whereas 




Fig. 4. - (4, C, B, and Q) Representative hematoxylin/eosln-fitamed 
images from breast tumors in Wnl-l transgenic mice. The correspond- 
ing dark-field images showing WISP-1 expression are shown In B and 
D. The tumor Is a moderately welklifferentiatBd adenocarcinoma 
showing evidence of adenoid cystic change. At low power (A and B), 
expression of WISP-1 is seen in the delicate branching fibrovascular 
tumor stroma (arrowhead). Al higher magnification, expression Is seen 
"in the stroroal(s) fibroblasts (C and D), and tumor cells are negative. 
Focal expression of WISP-1, however, was observed in tumor cells in 
some areas. Images of WISP-2 expression are shown in E-H. At low 
power (E and F), expression of "WISP-2 is seen tn cells lying within the 
fibrovascular tumor stroma. At hi£hr.r magnification, these cells 
appeared to be adjacent to capillary vessels whereas tumor cells are 
negative (<? and H). 
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the predominant celt type expressing WISP-1 was the stromal 
fibroblasts. 

Chromosome Localization of the WISP Gents. The chro- 
mosomal location of the human WISP genes was determined 
by radiation hybrid mapping panels. WISP-1 is approximately 
3.48 cR from the meiotic marker AFM259xc5 [logarithm of 
odds (lodj score 16.31] on chromosome 8q24.1 to 8q24.3, in the 
same region as the human locus of the novH family member 
(27) and roughly 4 Mbs distal to c-myc (28). Preliminary fine 
. mapping indicates that WISP-1 is located near D8S1712 STS. 
WISP-2 is linked to the marker SHOC-33922 (lod = 1,000) on 
chromosome 20ql2-20ql3.1, Human WISP-3 mapped to chro- 
mosome 6q22-6q23 and is linked to the marker AFM211ze5 
(lod = 1,000). WISPS is approximately 18 Mbs proximal to 
CTGF and 23 Mbs proximal to the human cellular oncogene 
MYB (27,29). 

Amplification and Aberrant Expression of WISPs In Human 
Colon Tumors. Amplification of protooncogenes is seen in 
many human tumors and has etiological and prognostic sig- 
nificance. For example, in a variety of tumor types, c-myc 
■ amplification has been associated with malignant progression 
and poor prognosis (30). Because WISP-1 resides in the same 
general chromosomal location (8q24) as c-myc, we asked 
whether it was a target of gene amplification, and, if so, 
whether this amplification was independent of the c-myc locus. 
Genomic DNA from human colon canceT cell lines was 
assessed by quantitative PGR and Southern blot analysis. (Fig. 
5 A and B). Both methods detected similar degrees of WISP-1 
amplification. Most cell lines showed significant (2- to 4-fold) 
amplification, with the HT-29 and WiDr cell lines demonstrat- 
ing an 8-fold increase. Significantly, the pattern of amplifica- 
tion observed did not correlate with that observed for c-m>ie, 
indicating that the c-myc gene is not part of the amplicon that 
involves the WISP-1 locus. 

We next examined whether the WISP genes were amplified 
in a panel of 25 primary human colon adenocarcinomas. The 
relative WISP gene copy number in each colon tumor DNA 
was compared with pooled normal DNA from 10 donors by 
quantitative PCR (Fig. 6). The copy number of WISP-1 and 
WISP-2 was significantly greater than one, approximately 
2-fold for WISP-1 in about 60% of the tumors and 2- to 4-fold 
foT WISP-2 in 92% of the tumors (P < 0.001 for each). The 
copy number for WISPS was indistinguishable from one (P = 
0.166), In addition, the copy number of WISP-2 was signifi- 
cantly higher than that of WISP-1 (P < 0.001). 

The levels of WISP transcripts in RNA isolated from 19 
adenocarcinomas and their matched normal mucosa wore 




Fio. 5. Amplification of WISP-1 genomic DNA in colon cancer cell 
lines. (A) .Amplification in cell line DNA was determined by quanti- 
tative PCR. (B) Southern blots containing genomic DNA (10 ng) 
digested with EcoKl (WISP-1) or Xba\ (c-nyc) were hybridized with 
a 100-bp human WISP-1 probe (amino acids 186-219) or a human 
ornyc probe (located at bp 1901-2000). The WISP and myc genes are 
delected in normal human geoomic DNA after a longer film exposure. 
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Flo, 6. Genomic amplification of WISP genes in human colon 
tumors. The relative gene copy number of the WISP genes in 25 
adenocarcinomas was assayed by quantitative FCR, by comparing 
DNA from primary human tumors with pooled DNA from 10 healthy 
donors. The data are means ± SEM from one experiment done in 
triplicate. The experiment was repeated al least three times. 

assessed by quantitative PCR (Fig. 7). The level of WISP-1 
RNA present in tumor tissue varied but was significantly 
increased (2- to >25-foId) in 84% (16/19) of the human colon 
tumors examined compared with normal adjacent mucosa. 
Four of 19 tumors showed greater than 10-fold overexpression. 
In contrast, in 79% (15/19) of the tumors examined, WTSP-2 
RNA expression was significantly lower in the tumor than the 
mucosa. Similar to WISP-1, WISPS RNA was overexpressed in 
63% (12/19) of the colon tumors compared with the normal 
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Flo. 7. WISP RNA expression in primary human colon tumors 
relative to expression in normal mucosa from the same patient. 
Expression at WISP mRNA in 19 adenocarcinomas was assayed by 
quantitative PCR. The Dukes stage of the tumor Is listed under the 
sample number. The data are means ± SEM from one experiment 
done in triplicate. The experiment was repeated at least' twice. 



mucosa. The amount of overexpression of WISPS ranged from 
4- to >40-foldL 



DISCUSSION 

One approach to understanding the molecular basis of cancer ' 
is to identify differences in gene expression between cancer 
celts and normal cells. Strategies based on assumptions that 
steady-state mRNA levels will differ between normal and 
malignant cells have been used to clone differentially ex- 
pressed genes (31). We have used a PCR-based selection 
strategy, SSH, to identify genes selectively expressed in 
C57MG mouse mammary epithelial cells transformed by 
Wnt-1. 

Three of the genes isolated, WISP-1, WISP-2, and WISP-3, 
are members of the CCN family of growth factors, which 
includes CTGF, Cyrul, andnov, a family not previously linked 
to Wnt signaling. 

Two independent experimental systems demonstrated that 
WISP induction was associated with the expression of Wnt-1. 
The first was C57MG cells infected with a WnM retroviral 
vector or C57MG cells expressing Wnt-1 under the control of 
a tetracyline-repressible promoter, and the second was in 
Wnt-1 transgenic mice, where breast tissue expresses Wnt-1, 
whereas normal breast tissue does not. No WISP RNA expres- 
sion was detected in mammary tumors induced by polyoma 
virus middle T antigen (data not shown). These data suggest 
a link between Wnt-1 and WISPs in that in these two situations, 
WISP induction was correlated with WnM expression. 

It is not clear whether the WISPs are directly or indirectly 
induced by the downstream components of the Wnt* 1 signaling 
pathway (i.e., /3-catenin-TCF-l/Lefl). The increased levels of 
WISP RNA were measured in Wnt-l-transformed cells, hours 
or days after Wnt-1 transformation. Thus, WISP expression 
could result from Wnt-1 signaling directly through 0-catcnin 
transcription factor regulation or alternatively through Wnt-1 
signaling turning an a transcription factor, which in turn 
Tegulates WISPi. 

The WISPs define an additional subfamily of the CCN family 
■ of growth factors. One striking difference observed in the 
protein sequence of WISP-2 is the absence of a CT domain, 
which is present in CTGF, Cyr6l, nov, WISP-1, and WISP-3. 
This domain is thought to be involved in receptor binding and 
dimerization. Growth factors, such as TGF-0, platelet-derived 
growth factor, and nerve growth factor, which contain a cystine 
knot motif exist as dimers (32). It is tempting to speculate that 
WISP-1 and WISP-3 may exist as dimers, whereas WISP-2 
exists as a monomer. If the CT domain is also important for 
receptor binding, WISP-2 may bind its receptor through a 
different region of the molecule than the other CCN family 
members. No specific receptors have been identified for CTGF 
or nov. A recent report has shown that integrin ctyfc serves as 
an adhesion receptor for Cyrol (33). 

The strong expression of WISP-I and WISP-2 in cells lying 
within the fibrovascular tumor stroma in breast rumors from 
Wnt-1 transgenic animals is consistent with previous obser- 
vations that transcripts for the related CTGF gene are pri- 
marily expressed in the fibrous stroma of mammary tumors 
(34). Epithelial cells are thought to control the proliferation of 
connective tissue stroma in mammary tumors by a cascade of 
growth factor signals similar to that controlling connective 
tissue formation during wound repair. It has been proposed 
that mammary tumor ceils or inflammatory cells at the tumor 
interstitial interface secrete TGF-01, which is the stimulus for 
stromal proliferation (34). TGF-/31 is secreted by a large 
percentage of malignant breast tumors and may be one of the 
growth factors that stimulates the production of CTGF and 
WISPs in the stroma. 

It was of interest that WISP-1 and WISP-2 expression was 
observed m the stromal cells that surrounded the tumor ceils 
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(epithelial cells) in the Wnt-1 transgenic mouse sections of 
' breast tissue. This finding suggests that paracrine signaling 
could occur in which the stromal cells could supply WISP-1 and 
WISP-2 to regulate tumor cell growth on the WISP extracel- 
lular matrix. Stromal cell-derived factors in the extracellular 
matrix have, been postulated to play a role in tumor cell 
migration and proliferation (35). The localization of WISP-1 
and WISP-2 in the stromal cells of breast tumors supports this 
paracrine model. 

An analysis of WISP-I gene amplification and expression in 
human colon tumors showed a correlation between DNA 
amplification and overexpression, whereas overexpression of 
WISPS RNA was seen in the absence of DNA amplification. 
In contrast, WISP-2 DNA was amplified In the colon tumors, 
but its mRNA expression was significantly reduced in the 
majority of tumors compared with the expression In normal 
colonic mucosa from the same patient, The gene for human 
WISP-2 was localized to chromosome 20ql2~20ql3, at a region 
frequently amplified and associated with poor prognosis in 
node negative breast cancer and many colon cancers* suggest- 
ing the existence of one or more oncogenes at this locus 
(36-38V Because the center of the 20ql3 amplicon has not yet 
been identified, it is possible that the apparent amplification 
observed for WISP-2 may be caused by another gene In this 
amplicon. ' 

A recent manuscript on rCop-1, the rat orthologue of 
WISP-2, describes the loss of expression of this gene after cell 
transformation, suggesting it may be a negative regulator of 
growth in cell lines (16). Although the mechanism by which 
WISP-2 RNA expression is down-regulated during malignant 
transformation is unknown, the reduced expression of WISP-2 
in colon tumors and cell lines suggests that it may function as 
a tumor suppressor. These results show that the WISP genes, 
are aberrantly expressed in colon cancer and suggest that their 
altered expression may confer selective growth advantage to 
the tumor. 

Members of the Wnt signaling pathway have been impli- 
cated in the pathogenesis of colon cancer, breast cancer, and 
melanoma, including the tumor suppressor gene adenomatous 
polyposis colt and ^-catenin (39). Mutations in specific regions 
of either gene can cause the stabilization and accumulation of 
cytoplasmic p-catenin, which presumably contributes to hu- 
man carcinogenesis through the activation of target genes such 
as the WISP&. Although the mechanism by which Wnt-1 
transforms cells and induces tumorigenesis Is unknown, the 
identification of WISPs as genes that may be regulated down- 
stream of Wnt-1 in C57MG cells suggests they could be 
important mediators of Wnt-1 transformation. The amplifica- 
tion and altered expression patterns of the WISP& in human 
colon tumors may indicate an important role for these genes 
in tumor development. 
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Appendix E 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

DEX-0172 
Salceda et al. 
09/763,978 
April 25, 2001 
Helms, Larry Ronald 
32800 
1642 
6964 

A Novel Method of Diagnosing, 
Monitoring, Staging, Imaging and 
Treating Various Cancers 

Declaration by Dr. Susana Salceda 

I , Susana Salceda, hereby declare: 

1. I was awarded a Masters of Science in Biochemistry 
in 1983 and a Ph.D. in Biochemistry in 1990, both from the 
School of Science at the University of Buenos Aires, 
Argentina. After obtaining my Ph.D., I served as a 
postdoctoral researcher at Thomas Jefferson University from 
1991 to 1998. While at Thomas Jefferson University I 
contributed to the analysis of mechanisms of oxygen 
sensing, signal transduction and regulation of gene 
expression by hypoxia and other stimuli . 

From 1998 to 2002, I worked in the Gene Discovery 
division at diaDexus, Inc. holding the position of 
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Scientist. At diaDexus I contributed to research using 
genomics based analyses focusing on the discovery, 
identification and characterization of novel 
polynucleotides and encoded proteins differentially 
expressed in cancer. Identified polynucleotides and 
encoded proteins were used to develop novel diagnostic and 
therapeutic products for the improved detection, 
classification, prognosis and treatment of cancer. 

Since 2002, I have been a Senior Scientist working in 
the Expression Product Development Department at 
Affymetrix, Inc., in Santa Clara, CA. At Affymetrix I 
contribute to the development of new assays and reagents to 
process DNA and RNA samples for microarray analysis. 

2. As a scientist, a former diaDexus employee, and a 
named inventor, I am familiar with the teachings of the 
above-referenced patent application. I was responsible 
for the discovery of OvrllO and the sequences encoding it. 

3 . I have reviewed and am familiar with the office 
action in the above-referenced patent application dated 
June 22, 2005 from the U.S. Patent Office. 

4 . I understand the Examiner has taken a position 
that the "invention is not supported by either a 
substantial asserted utility or a well established 
utility." I respectfully disagree. 

5. At the time of the invention the usefulness of an 
isolated antibody or antibody fragment that binds 
specifically to a cancer marker such as the protein encoded 
by polynucleotide SEQ ID NO: 1 was well known. 
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6. Further, at the time of the invention we routinely 
obtained a protein sequence or open reading frame from 
information related to a polynucleotide sequence such as 
that provided for the polynucleotide sequence of SEQ ID NO: 
1. 

For example, as shown in Examples 1 and 2 of the above 
referenced patent application, the sequence and expression 
data of SEQ ID N0:1 is based on an mRNA molecule and 
therefore has a set 5' to 3' orientation. Thus, from, this 
information, we know the protein is encoded in the forward 
(5' to 3') direction of SEQ ID NO: 1. 

Furthermore, since expressed mRNA encode for proteins 
we know that the open reading frame in the forward 
direction of SEQ ID NO: 1 would be in a frame encoding for 
a Methionine near the 5' end, encode many amino acids and 
terminate with a stop codon. Thus, any reading frame 
sequence of SEQ ID NO: 1 with lots of stop codons can be 
ruled out since we know to look for a long open reading 
frame sequence beginning with an M and ending with a stop 
codon in accordance with the information taught in the 
patent application about SEQ ID NO: 1. 

By 1998 there were many tools available for use to 
determine either the protein sequence or the open reading 
frame (ORF) of a sequence such as SEQ ID NO: 1. Examples 
of such programs include the MAP 1 application, part of the 
GCG software suite from Accelrys Software Inc. (San Diego, 
CA) , the Translate application, part of ExPASy (Expert 
Protein Analysis System) available online (at 
www.expasy.org/tools/dna.html) from the Swiss Institute of 



1 Devercux J, Haeberli P, Smithies O. (1984 NAR 11, 387-395) 
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Bioinformatics (Lausanne, Switzerland) and the ORF Finder 
(Open Reading Frame Finder) application available online 
(at www.ncbi.nlm.nih.gov/gorf/gorf .html) from the National 
Center for Biotechnology Information (NCBI) (Bethesda, MD) . 

As examples, attached are the results of the MAP, 
Translate and ORF Finder programs described above. The 
attached MAP program results (Figure 1) display SEQ ID NO: 
1 as taught in the patent application in the forward 
direction, the reverse complement strand, and the protein 
translation of the three frames of the forward nucleotide 
strand followed by the protein translation of the three 
frames of the reverse compliment strand. For clarity, the 
open reading frame and protein encoded by SEQ ID NO: 1 have 
been underlined. As with many programs, the start codons 
encoding a Methionine (denoted by n M" or "Met") and stop 
codons not encoding an amino acid (denoted by or 
"Stop") are in bold. Also displayed in the MAP results, 
but not relevant to the open reading frame or encoded 
protein, are the nucleotide restriction sites for the 
endonuclease SAU3AI. 

The attached Translate program results (Figure 2) 
display the protein translations of the three forward 
frames (5'3') followed by the protein translation of the 
three frames of the reverse compliment strand (3 '5'). For 
clarity, the protein encoded by SEQ ID NO: 1 has been 
underlined. 

The attached ORF Finder program results (Figure 3) 
displays a graphical representation of the ORFs greater 
than 100 nucleotides in length in each of the six frames of 
SEQ ID NO: 1. The longest open reading frame is listed 
first on the right as frame +2 from nucleotide 62-910 with 
a length of 849 nucleotides. This open reading frame is 
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selected (highlighted) in the display and the ORF 
nucleotide sequence and encoded 282 amino acid protein 
sequence is displayed below. 

Using the attached results from the MAP application, 
Translate application, ORF Finder application, or output 
from another simple translation program, the encoded 
protein and open reading frame are clear. Here MAP, 
Translate or ORF Finder show the protein encoded by SEQ ID 
NO: 1 is 282 amino acids long. Thus, using only the 
information taught in the specification as filed, the open 
reading frame for SEQ ID NO: 1 and the encoded protein can 
be routinely and unambiguously identified. 

7 . The Examiner also suggests that there was w nb 
indication of what the protein [encoded by SEQ ID NO: 1] 
was." I respectfully disagree. As shown by the attached 
results from the MAP application, Translate application and 
ORF Finder application, the protein encoded by SEQ ID NO: 1 
was readily obtainable with tools used routinely as of 
1998. 

8. Similarly, the process of expressing the protein 
encoded by a nucleotide such as SEQ ID NO: 1 and generating 
antibodies to the protein was well known as of 1998 and 
prior thereto. 

9. I respectfully disagree with, the Examiner's 
suggestion that this sequence and invention are "starting 
points for further research and investigation into 
potential practical uses," As shown herein, the nucleotide 
sequence: of SEQ ID NO: 1 and the characteristics disclosed 
in the patent application about SEQ ID NO: 1 were adequate 
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to routinely and unambiguously obtain the protein sequence 
and then generate antibodies or antibody fragments thereto. 

10. I also respectfully disagree with the Examiner's 
suggestions that "one would not have known a utility for 
such a protein" and that the "specification does not teach 
a utility for use of the antibody." The patent application 
teaches that "the mRNA overexpression in most of the 
matching samples tested are indicative of OvrllO... being a 
diagnostic marker for gynecologic cancers." Further, uses 
for the protein expressed by the CSG encoded by SEQ ID NO: 
1 are explicitly described in the specification. Since the 
mRNA of SEQ ID NO: 1 is overexpressed in gynecologic 
cancers samples, and encodes a protein, the value of 
antibodies to this protein to detect overexpressed protein 
in gynecologic cancers would also be understood. 

Further, the specification explicitly teaches that 
antibodies against Cancer Specific Genes (CSG) such as SEQ 
ID NO: 1 w can be used to detect or image localization of 
CSG in a patient for the purpose of detecting or diagnosing 
selected cancers." 

The specification also explicitly teaches that 
antibodies against Cancer Specific Genes (CSG) such as SEQ 
ID NO: 1 "can be injected into a patient suspected of 
having a selected cancer for diagnostic and/or therapeutic 
purposes . " 

Furthermore, contrary to the Examiner's suggestion, 
the specification provides detailed teachings as to how one 
of skill in the art could use these antibodies in an ELISA 
assay or a competition assay to detect cancer, thus 
providing guidance regarding use of the invention, "in a 
manner that constitutes a substantial utility. " 
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11. I also respectfully disagree with the Examiner's 
suggestion that "applicants were not in possession of any 
protein encoded by SEQ ID NO: 1." As I showed herein, using 
standard tools available at the time of the invention, one 
of skill in the art could readily determine the protein 
encoded by SEQ ID NO: 1. All the necessary information to 
do so is provided by the polynucleotide sequence and the 
characteristics of this sequence taught in the patent 
application. 

I hereby declare that all statements herein of my own 
knowledge are true and that all statements made on 
information or belief are believed to be true; and further 
that these statements were made with the knowledge that 
willful statements and the like so made are punishable by 
fine or by imprisonment, or both, under §1001 of Title 18 
of the United States code, and that such willful statements 
may jeopardize the validity of the application, any patent 
issuing there upon, or any patent to which this verified 
statement is directed. 




Date 
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FIGURE 1 

1/9 

(Linear) MAP of: dex0043_l . seq check: 5695 from: 1 to: 2587 

DEX0043_1 

With 1 enzymes: SAU3AI 

Forward frame translations: 

gga aggcagc gggc age t c c ac t cage cag t acc caga t ac gc t gggaac cttccccagc 

1 + + + + + + 60 

ccttccgtcgcccgtcgaggtgagtcggtcatgggtctatgcgacccttggaaggggtcg 

a GRQRAAPLSQYPDTLGTFPS 

b EGSGQLHSASTQIRWEPSPA- 

c KAA G S S TQ PVP RYAGNL PQ P - 

Sau3AI 
I 

c atggcttccctggggcagatcctcttctggagcataattagcatcatcattattctggc 

61 + + + + + 120 

gtaccgaagggaccccgtctaggagaagacctcgtattaatcgtagtagtaataagaccg 

a HGFPGADPLLEHN*HHHYSG 

b MASLGQI.LFWSIISIIIILA - 

c WLPWGRSSSGA*LASSLFWL- 

tggagcaattgcactcatcattggctttggtatttcagggagacactccatcacagtcac 

121 + + + + + + 180 

acctcgttaacgtgagtagtaaccgaaaccataaagtccctctgtgaggtagtgtcagtg 

a WSNCTHHWLWYFRETIiHHSH 

b GAIALIIGFGISGRHSITVT - 

c ~E Q L H S S L A L V F Q G D T P S Q S L - 

tactgtcgcctcagctgggaacattggggaggatggaatcctgagctgcacttttgaacc 

181 + + + + + 240 

atgacagcggagtcgacccttgtaacccctcctaccttaggactcgacgtgaaaacttgg 

YCRLSWEHWGGWNPELHF *T - * 

tvasagnige'dgilsctfep - 
~z s p^ q l g t l g r h e s * a a l l n l - 

tgacatcaaactttctgatatcgtgatacaatggctgaaggaaggtgttttaggcttggt 

241 + + + +' + + 300 

actgtagtttgaaagactatagcactatgttaccgacttccttccacaaaatccgaacca 

*HQTF*YRDTMAEGRCFRLG 
DIKLSDIVIQWLKEGVIiGIiV - 
TSNFLI S*YNG*RKVF*AWS- 

ccat^agttcaaagaaggcaaa-gatslagctgtcggagcaggatgaaat jttcagaggccg 

ggtactcaagtttcttccgtttctactcgacagcctcgtcctactttacaagtctccggc 

P*VQRRQR*AVGAG*NVQRP 
HEF KEGKDELSEQDEMFRGR - 
~M S S K K A K M S C R S R M K C S E A G - 
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FIGURE 1 

2/9 

Sau3AI 
I 

gacagcagtgtttgctgatcaagtgatagttggcaatgcctctttgcggctgaaaaacgt 

361 + + + + + + 420 

ctgtcgtcacaaacgactagttcactatcaaccgttacggagaaacgccgactttttgca 

DSSVC*SSDSWQCLFAAEKR 
TAVFADQVIVGNASLRLKNV - 
QQCLLIK**LAMPLCG*KTC- 

gcaactcacagatgfatggcacctacaaat^ttatatcatcacttctaaaggcaaggggaa 

421 + + + +- --+ 480 

cgttgagtgtctacgaccgtggatgtttacaatatagtagtgaagatttccgttcccctt 

ATHRCWHLQM LYHHF * RQGE 
QIiTDAGTYKCYIIT.SKGKGN - 
NSQMLAPTNVI SSLLKARGM- 

tg^taaccttgagtataaaactggagcctbcagcatg}:cggaagtgaatgftggactataa 

481 --J- +- + + +- + 540 

acgattggaactcatattttgacctcggaagtcgtacggccttcacttacacctgatatt 

C*P*V*NWSLQHAGSECGL* 
ANLEYKTGAFSMPBVNVDYN- - 
LTLS IKI*EPSACRK*MWTIM- 

tgccagctcagagaccttgcggtgtgaggctccccgatggittcccccagcccacagtggt 

541 + + + -f* + + 600 

acggtcgagtctctggaacgccacactccgaggggctaccaagggggtcgggtgtcacca 

CQLRDLAV*GSPMVPPAHSG 
ASSETLRCEAPRWFPQPTVV - 
PAQRPCGVRLPDGS P S P Q W S - 

ctgggcatcccaagttgaccagggagccaacttctcggaagtctccaataccagctttga 

601 + + + + + + 660 

gacccgtagggttcaactggtccctcggttgaagagccttcagaggttatggtcgaaact 

LGIPS*PGSQLLGSLQYQL* 
WASQVD-QGANF SEVSNTSF E - 
GH PKLTREPTSRKS P I PALS- 

Sau3AI 

gctgaactctgagaatgtgaccatgaaggttgtgtctgtgctctacaatgttacgatcaa 

661 + ^— + ^ + + + 720 

cgacttgagactcttacactggtacttccaacacagacacgagatgttacaatgctagtt 

AEL*ECDHEGCVCALQCYDQ 
LNSENVTMKVVSVLYNVTIN - 
*TLRM*P*RLCLCSTMLRST- 
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FIGURE 1 

3/9 

caacacatactcctgbatgkttgaaaatg t J acattgccaaagcaacaggggatatcaaagt 

721 + -+ -+ + + + 780 

gttgtgtatgaggacatactaacttttactgtaacggtttcgttgtcccctatagtttca 

QHILLYD*K*HCQSNRGYQS 
NTYSCMIENDIAKATGDIKV - 
THTPV* LKMTLP KQQGISK * - 

Sau3AI 
I 

gacagaatcggagatcaaaaggcggagtcacctacagctgctaaactcaaaggcttctct 

781 + + + + + + 840 

ctgtcttagcctctagttttccgcctcagtggatgtcgacgatttgagtttccgaagaga 

DRIGDQKAESPTAAKLKGFS 
TESEIKRRSHLQLLNSKASL - 
QNRRSKGGVTYS C * T Q R L L C - 

gtgtgtctcttctttctttgccatcagctgggcacttctgcctctcagcccttacctgat 

841 + + + : — h + + 900 

cacacagagaagaaagaaacggtagtcgacccgtgaagacggagagtcgggaatggacta 

VCLFFLCHQLGTSASQPLPD 
CVS SFFAI SWALLPLS PYLM - 
VSLLSL PSAGHFCLSALT*C- 

Sau3AI 

JctaaaataatgEgccttggccacaaaaaagcatgcaaagtcattgttacaacagggatc 

901 2 +-J + + 1—+ + + 960 

cgattttattacacggaaccggtgttttttcgtacgtttcagtaacaatgttgtccctag 

AKIMCLGHKKACKVIVTTGI 
L K * CALATKK HAKSLLQQGS - 
♦NNVPWPQKSMQ SHCYNRDL- 

tacagaactatttcaccaccagatatgjacctagttttatatttctgggaggaaatSaatt 

961 + + + + + + 1020 

atgtcttgataaagtggtggtctatactggatcaaaatataaagaccctcctttacttaa 

YRTI SPPDMT*FYISGRK*I 
TELFHHQI*PSFIFLGGNEF- 
QNYFTTRYDLVIiYFWEEMNS- 

catatctagaagtctggagtgagcaaacaagagcaagaaacaaaaagaagccaaaagcag 

1021 + + + + + + 1080 

gtatagatcttcagacctcactcgtttgttctcgttctttgtttttcttcggttttcgtc 

HI *KSGVSKQEQETKRSQKQ 
ISRSLE*ANKSKKQKEAKSR- 
YLEVWSEQTRARNKKK-PKAE- 
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FIGURE 1 

4/9 

aaggctccaatatgaacaagataaatctatcttcaaagacatattagaagttgggaaaat 

1081 + + + + + + 1140 

ttccgaggttatacttgttctatttagatagaagtttctgtataatcttcaaccctttta 

a KAP I *TR*IYLQRHIRSWEN 

b RLQ YEQDKSIFKDILEVGKI- 

c GSNMNKINLSSKTY* K L G K * - 
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FIGURE 1 

5/9 

Reverse frame translations: 

aattcatgtgaactagacaagtgtgttaagagtgataagtaaaatgcacgtggagacaag 

1141 + + + + + + 1200 

ttaagtacacttgatctgttcacacaattctcactattcattttacgtgcacctctgttc 

a KSCELDKCVKSDK*NARGDK 

b IHVN*TSVLRVISKMHVETS~ 

c FM*TRQVC*E**VKCTWRQV- 

Sau3AI 
I 

tgcatccccagatctcagggacctccccctgcctgtcacctggggagtgagaggacagga 

1201 + + + + + + 1260 

acgtaggggtctagagtccctggagggggacggacagtggacccctcactctcctgtcct 

a CIPRSQGPPPACHLGSERTG 
b AS PDIiRDLPLPVTWGVRGQD 

c hpqisgtspclspge*edri- 

tagtgcatgttctttgtctctgaatttttagttatatgtgctgtaatgttgctctgagga 

1261 + + + + 1320 

atcacgtacaagaaacagagacttaaaaatcaatatacacgacattacaacgagactcct 

a *CMFFVSEFIjVICAVMLL*G 

b SACSLSLNF*IiYVL*CCSEE- 

c VHVLCL* IFSYMCCNVALRK-- 

agcccctggaaagtctatcccaacatatccacatcttatattccacaaattaagctgtag 

1321 + + + + + + 1380 

* tcggggacctttcagatagggttgtataggtgtagaatataaggtgtttaattcgacatc 

a SPWKVYPNISTSYIPQIKL* 

b APGKSIPTYPHLIFHKLSCS- 

c P2jESLSQHIHI LYSTN*AVV- 

tatgtaccctaagacgctgctaattgactgccacttcgcaactcaggggcggctgcattt 

1381 + + + + + + 1440 

atacatgggattctgcgacgattaactgacggtgaagcgttgagtccccgccgacgtaaa 

a YVP*DAAN*XiPLRNSGAAAF 

b MYPKTLLIDCHFATQGRLHF- 

c CTLRRC * LTATSQLRGGC IL- 

tagtaatgggtcaaatgattcactttttatgatgcttccaaaggtgccttggcttctctt 

1441 + + + + + + 1500 

atcattacccagtttactaagtgaaaaatactacgaaggtttccacggaaccgaagagaa 

a- **WVK*FTFYDASKGALASL 

b SNGSNDSLFMMLPKVPWLLF- 

C VMGQMXHFL*CFQRCLGFSS- 
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FIGURE 1 

6/9 

Sau3AI 

I 

cccaactgacaaatgccaaagttgagaaaaatgatcataattttagcataaacagagcag 

1501 + + + + + + 1560 

gggttgactgtttacggtttcaactctttttactagtattaaaatcgtatttgtctcgtc 

PN*QMPKLRKMIIILA*TEQ 
PTDKCQS*EK*S*F*HKQSS- 
QLTNAKVEKNDHNF S I N R A V - 

tcggcgacaccgattttataaataaactgagcaccttctttttaaacaaacaaatgcggg 

1561 + + + + + + 1620 

agccgctgtggctaaaatatttatttgactcgtggaagaaaaatttgtttgtttacgccc 

SATPIL*IN*APSF*TNKCG 
RRHRFYK*TEHLLF KQTNAG - 
GDTDF INKLSTFFLNKQMRV- 

tttatttctcagatgatgttcatccgtgaatggtccagggaaggacctttcaccttgact 

1621 + + + + + + 1680 

aaataaagagtctactacaagtaggcacttaccaggtcccttcctggaaagtggaactga 

FISQMMF IREWSREGPFTLT 
LFIiR*CSSVNGPGKDLS P * L - 
YFSDDVHP*MVQGRTFHLDY- 

atatggcattatgtcatcacaagctctgaggcttctcctttccatcctgcgtggacagct 

1681 + + + + + 1740 

tataccgtaatacagtagtgttcgagactccgaagaggaaaggtaggacgcacctgtcga 

IWHYVITSSEASPFHPAWTA 
YGIMSSQALRLLLSILRGQL- 
MALCHHKL*GFSFP SCVDS*- 

aagacctcagttttdaatagcatctagagcagtgggactcagctggggtgatttcgcccc 

1741 + + + + + + 1800 

ttctggagtcaaaagttatcgtagatctcgtcaccctgagtcgaccccactaaagcgggg 

KTSVFNSI*SSGTQLG*FRP 
RPQFS IASRAVGLSWGD FAP - 
DLSFQ *HLEQWDSAGVI SPP- 

ccatctccgggggaatgtctgaagacaattttggttacctcaatgagggagtggaggagg 

1801 +■ + , + +- . + + 1860 

ggtagaggcccccttacagacttctgttaaaaccaatggagttactccctcacctcctcc 

PSPGECLKTILVTSMREWRR 
HLRGNV* RQFWL PQ * G S.GGG 
I SGGMSEDNFGYLNEGVEED- 
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FIGURE 1 

7/9 



atacagtgctactaccaactagtggataaaggccagggatgctgctcaacctcctaccat 

, + + + + + 

fcatgtcacgatgatggttgatcacctatttccggtccctacgacgagttggaggatggta 



IQCYYQLVDKGQGCCSTSYH - 
YSATTN*WIKARDAAQPPTM- 
TVLLPTSG*RPGMLLNLLPC- 

gtacaggacgtctccccattacaactacccaatccgaagtgtcaactgtgtcaggactaa 

1921 + + + + + 'I 1980 

catgtcctgcagaggggtaatgttgatgggttaggcttcacagttgacacagtcctgatt 

VQDVSPLQLPNP- KCQLCQD* 
YRTSPHYNYPIRSVNCVRTK- 
TGRLPIT TTQS EVSTVSGLR- 



qaaaccctggttttgagtagaaaagggcctggaaagaggggagccaacaaatctgtctgc 

_ + + + + + + 20 

ctttgggaccaaaactcatcttttcccggacctttctcccctcggttgtttagacagacg 

ETLVLSRKGPGKRGANKSVC 
KPWF *VEKGIiERGEPTNLSA- 
NPGFE*KRAWKEGSQQICLL- 



ttctcacattagtcattggcaaataagcattctgtctctttggctgctgcctcagcacag 

2041 + +- — + + + + 2100 

aagagtgtaatcagtaaccgtttattcgtaagacagagaaaccgacgacggagtcgtgtc 

FSH* SLANKHSVSIjAAASAQ 
SHISHWQISILSLWLLPQHR- 
LTIiVIGK*AFCLFGCCLSTE- 

agagccagaactctatcgggcaccaggataacatctctcagtgaacagagttgacaaggc 

2101 + + + + + ::" + 2160 

tctcggtcttgagatagcccgtggtcctattgtagagagtcacttgtctcaactgttccg 
RARTLSGTRITSLSEQS*QG 

epelyrapg*hlsvnrvdka- 

SQNSIGHQDNISQ*TELTRP- 

ctatgggaaatgcctgatgggattatcttcagcttgttgagcttctaagtttctttccct 

2161 + + + + + + 2220 

gataccctttacggactaccctaatagaagtcgaacaactcgaagattcaaagaaaggga 

lwempdgiifsllsf*vsfp 
ygkci*mglssac*askflsl- 
mgna*wdylqlvsllsffpf- 

tcattctaccctgcaagccaagttctgtaagagaaatgcctgagttctagctcaggtttt 

2221 + + + + + + 2280 

agtaagatgggacgttcggttcaagacattctctttacggactcaagatcgagtccaaaa 

sfypasqvl*ekclssssgf 
hstlqakfckrna*vlaqvf- 
ilpckpssvrempef * l r f s - 



98 



FIGURE 1 

8/9 

Sau3AI 
I 

cttactctgaatttagatctccagacccttcctggccacaattcaaattaaggcaacaaa 

2281 + " + + + + + 2340 

gaatgagacttaaatctagaggtctgggaaggaccggtgttaagtttaattccgttgttt 

LTLNLDLQTLPGHNSN * GNK 
LL *I*ISRPFLATIQIKATN- 
YSEFRSPDPSWPQFKLRQQT- 

catataccttccatgaagcacacacagacttttgaaagcaaggacaatgactgcttgaat 

2341 — + + + + + + 2400 

gtatatggaaggtacttcgtgtgtgtctgaaaactttcgttcctgttactgacgaactta 

HIPSMKHTQTFESKDNDCLN - ' 
IYLP*STHRLLKARTMTA*I- 
YTFHEAHTDF*KQGQ* LLEL- 

tgaggccttgaggaatgaagctttgaaggaaaagaatactttgtttccagcccccttccc 

2401 + + + + + + 2460 

actccggaactccttacttcgaaacttccttttcttatgaaacaaaggtcgggggaaggg 

*GLEB*SFEGKEYPV.SSPLP 
EALRNEALKEKNTLFPAPFP- 
RP *GMKL*RKRILCFQPPSH- 

acactcttcatgtgttaaccactgccttcctggaccttggagccacggtgactgtattac 

2461 + + + + + + 2520 

tgtgagaagtacacaattggtgacggaaggacctggaacctcggtgccactgacataatg 

TLFMC * PL P SWTLE P R * LYY 
HSSCVNHCLPGPWSHGDCIT- 
TLHVIiTTAFLDLGATVTVLH- 

Sau3AI 
I 

atgttgttatagaaaactgattttagagttctgatcgttcaagagaatgattaaatatac 

2521 + + + + + + 2580 

tacaacaatatcttttgactaaaatctcaagactagcaagttctcttactaatttatatg 

MLL*KTDFRVLIVQEND*IY 
CCYRKLILEF*SFKRMIKYT- 
VVIEN*F* SSDRS RE * LNI H- 

atttcct 

2581 2587 

taaagga 

IS 
F P 
F 
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Enzymes that do cut : 

Sau3AI 
Enzymes that do not cut 

NONE 



FIGURE 
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Translate Tool - Results of translation 



Please select one of the following frames : 
53' Frame 1 

XGRQRAAPLS Q YPDTLG TFPSHGFP GAD PLLEHN Stop HHHYS G 
WSNCTHHWLW YFRETLHHS HYCRLS WEHWG GWNPELHFStopT 
Stop H Q T F Stop Y R D T Met AEGRCFRLGP Stop V Q R R Q R Stop A V G A G 
S top N V Q R P D S S -V C S top S S D S W Q C L F A A E K R A T H R C W H L Q Me t L Y 
H H F Stop R Q G E C Stop P Stop V Stop NWSLQHAGSECGL Stop C Q L R D L A 
V Stop G S P Met VPPAHSGLGIPS Stop PGSQLLGSLQYQL Stop A E L Stop 
ECDHEGC VC ALQC YD QQHILLYDStopKStopHCQS NR l G YQSDRI 
GDQKAESPTAAKLKGFSVCLFFLCHQLGTSASQPLPD AKIMet C 
LGHKKACKVIVTTGIYRTISPPD Met T Stop F Y I S G R K Stop I H I Stop K 
SGVSKQEQETKRSQKQKAPIStopTR Stop IYLQRHIRSWENNSCEL 
DKCVKSDK Stop NARGDKCIPRSQGPPPACHLGSERTG Stop C Met F 
FVSEFLVICAVMetLLStopGSPWKVYPNISTS YIP Q IKLStop Y VP 
StopDAANStopLPLRNSGAAAFStop Stop W VKStop FTP YD A S KG AL A 
S LPNStop QMetPKLRKMetHIL AStop TEQS ATPILStop IN Stop APS F 
Stop TNK'CGFISQ Met Met FIREWSREGPFTLTIWHYVITSSEASPFH 
PAWTAKTS VFNSI Stop S S G T Q L G Stop FRPPSPGECLKTILVTS Met 
REWRRIQCYYQLVDKGQGCCSTSYHVQDVSPLQLPNPKCQLC 
QDStopETLVLSRKGPGKRGANKSVCFSH Stop SLANKHS VSLAAA 
SAQRARTLSGTRITSLSEQS Stop Q G L W E Met PDGIIFSLLSF Stop V 
SFPSFYPASQVL Stop EKCLSSSSGFLTLNLDLQTLPGHNSN Stop G 
NKHIP S MetKHT QTFES KDND CLNStop GLEEStop SFEGKE YF VS S P 
L P T L F Met C Stop PLPSWTLEPR Stop L Y Y Met L L Stop KTDFRVLIVQE 
NDStopIYIS 

5'3' Frame 2 

YT?.r T .<irTnT.K.S ASTQTRWEPSPA Met ASLGOILFWS IISIIIILAGAIA 
LII GFGIS GRHS IT VTTV AS AGNIGED GILS CTFF PDIKLS DIVIO 
WLKEGVLGLVHEFKEGKDELSEODE Met PRGRT A VFADOVIVG 
NASLRLKNVOLTD AGTYKCYIITSKGKGNANLEY KTGAFS Met P 
EVNVDYNASSETLRCEAPRWFPOPTVVWASOV DOGANFSEVS 
NTSFELNSENVT Met KVVS VLYNVTINNTYSC Me t IENDIAKATG 
DIKVTESEIKRRSHLOLLNSKASLCVSSFFAISWALLPLSPYL 
Met L K Stop CALATKKHAKSLLQQGSTELFHHQI Stop PSFIFLGGNE 
FIS RSLEStop ANKSKKQKE AKSRRLQ YEQDKS IFKDILE VGKIIH 
VNStopT S VLR VIS KMetH VETS ASPDLRDLPLPVTW GVR GQD S A 
CSLSLNF Stop LYVL Stop CCSEEAPGKSIPTYPHLIFHKLSCS Met Y 
PKTLLIDCHFATQGRLHFSNGSNDSLF Me t Met LPKVPWLLFPTD 
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K C Q S Stop E K Stop S Stop F Stop HKQSSRRHRFYK Stop TEHLLFKQTNA 
GLFLRStopCS S VNGPGKDLSPStopLYGIMetS S QALRLLLSILRGQ 
LRPQFSIASRAVGLSWGDFAPHLRGN V Stop R Q F W L P Q Stop G S G G 
GYS ATTN Stop W IK ARD A A QPPTMet YRTS PH YN YPIRS VNC VRT 
KKP WFStop VEKGLERGEPTNLS A SHISHWQISILSLWLLPQHRE 
PEL YR APG Stop HLS VNR VD K A Y GKCLMet GLS S A CStop AS KFL S L 
HSTLQAKFCKRNA Stop VLAQVFLL Stop I Stop ISRPFLATIQIKATN 
I Y L P Stop STHRLLKART Met TAStopIEALRNEALKEKNTLFPAPFP 
HSSCVNHCLPGPWSHGD C IT C C YRKLI LEF Stop S FKRMetlKYTF 
P 



XKAAGSSTQPVPRYAGNLPQPWLPWGRSSSGA Stop LASSLFWL 
EQLHSSLALVFQGDTPSQ S LLS P QLGTLGR MetES Stop A ALLNLT 
S N F L I S Stop Y N G Stop R K V F Stop AWS Met SSKKAK Met S C R S R Met K C 
SEAGQQCLLIK Stop Stop L A MetP LC G Stop KT CNS QMet L AP TN VI S S 
L L K A R G Met LTLSIKLEPSACRK Stop Met W T I Met PAQRPCGVRLPD 
GSPSPQWSGHPKLTREPTSRKSPIPALS Stop T L R Met Stop P Stop RLC 
L C S T Met LRSTTHTPY Stop L K Met TLPKQQGISK Stop QNRRSKGGV 
TYSCStopTQRLLCVSLLSLPSAGHFCLSALTStopCStopNNVPWPQ 
K S Met QSHCYNRDLQNYFTTRYDLVLYFWEE Met NSYLEVWSEQ 
TRARNKKKPKAEGSN Met NKINLSSKTY Stop KLGK Stop F Met Stop T 
R Q V C Stop E Stop Stop VKCTWRQVHPQISGTSPCLSPGE Stop E D R I V 
HVLCL Stop I F S Y Met CCNVALRKPLESLSQHIHILYSTN Stop A V V C 
TLRRCStopLT ATS QLRGGCILVMetGQMetlHFLStop CFQRCLGFS S 
QLTNAKVEKNDHNFSIN R A VGDTDFINKLSTFFLNKQ Met R V Y F 
SDD VHP Stop Met V Q GRTFHLD YMet ALCHHKLStop GFSFPS C VD S 
Stop D L S F Q Stop HLEQWDSAGVISPPISGG Met SEDNFGYLNEGVE 
EDTVLLPTSG Stop R P G Met LLNLLPCTGRLPITTTQSEVSTVSGLR 
N P G F E Stop KRAWKEGSQQICLLLTLVIGK Stop AFCLFGCCLSTES 
QNSIGHQDNISQ Stop TELTRP Met G N A Stop WD YLQLVELLSFFPFI 
LP CKPS S VREMetPEFStopLRFS YSEFRSPDPSWPQFKLRQQTYT 
FHEAHTDF Stop KQGQ Stop L L E L R P Stop G Met K L Stop RKRILCFQPPS 
HTLHVLTTAFLDLGATVTVLHVVIEN Stop F Stop S S D R S R E Stop L N 
IHF 

3'5' Frame 1 

RKCIFNHSLERSEL Stop NQFSITTCNTVTVAPRSRKAVVNT Stop R 
VWEGGWKQSILFLQSFIPQGLNSSSHCPCFQKSVCASWKVYVC 
CLNLNCGQEGSGDLNSE Stop E N L S Stop NSGISLTELGLQGR Met K 
GKKLRSSTS Stop R Stop SHQAFPIGLVNSVH Stop E Met LSWCPIEFW 
LSVLRQQPKRQNAYLP Met TNVRSRQICWLPSFQ ALFYSKPGFL 
SPDTVDTSDWVVVMetGRRPVHGRRLSSIPGLYPLYGSSTVSSS 
TPS LR Stop PKLSSDIPPE Met GGEITPAESHCSRCY Stop K L R S Stop L 
STQDGKEKPQSL Stop StopHN A I Stop S R Stop KV LP WTIHGStopTSSE 
KStop TRICLFKKKVLSLFIKSV S PT ALFMetLKLStop SFFSTLAFVS 
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WEEKPRHLWKHHKKStopIIStop PITKMet QP PLS CE V A VNStop QRL 
R VHTT A StopF VE YKMet WIC WD RLS RGFLR ATLQ HIStopLKIQRQ 
RTCTILS S HS P GDR QGE VPEI WG CTCLHVHFT YHS StopHTCL VH 
MetN YFPNFStop Y VFED RFILFILEPS AF GFFLFL AL VC S LQTS R Y 
EFISSQKYKTRSYLVVK Stop F C R S L L Stop Q Stop L C Met LF C G Q G T L 
F Stop H Q V R A E R Q K C P A D G K E R RD T Q R S L Stop V Stop Q L Stop V T P P F 
DLRFCHFDIP C CFGN VIFNHTG V C VVDRNI VEHRHNLHGHILR 
VQLKAGIGDFREVGSLVNLGCPDHCGLGEPSGSLTPQGLStopA 
GIIVHIHFRHAEGSSFILKVSIPLAFRSDDITFVGASICELHVFQP 
QRGIANYHLISKHCCPASEHFILLRQLIFAFFEL Met DQAStopNTF 
LQ PLYHD IRKFD VRFKS A AQD SILPN VPS Stop GDSSDCDGVSP 
StopNTKANDECNCS S QNNDD AN YAPEEDLPQGS HGWGRFP A Y 

LGTGStop VELPAAFXX 
3'5' Frame 2 

GNVYLIILLNDQNSKISFLStopQHVIQSPWLQGPGRQWLTHEEC 
GKGAGNKVFFSFKASFLKASIQAVIVLAFKSLCVLHGRY Met F V 
A L I Stop IVARKGLEI Stop I Q S K K T Stop ARTQAFLLQNLACRVE Stop 
RERNLEAQQAEDNPIRHFP Stop ALSTLFTERCYPGAR Stop S S G S L 
C Stop GSSQRDR Met L I C Q Stop L Met Stop EADRFVGSPLSRPFSTQNQ 
GFLVLTQLTLRIG Stop L Stop WGDVLY Met V G G Stop AASLAFIH Stop 
LVVALYPPPLPH Stop GNQNCLQTFPRRWGAKSPQLSPTALDAIE 
N Stop G L S C P R R Met ERRSLRACDDI Met PYSQGERSFPGPFTDEHH 
LRNKPAFVCLKRRCS V YLStopNRCRRLLCLCStopN YDHFS QLW 
HLSVGKRSQGTFGSIIKSESFDPLLKCSRP Stop VAKWQSISSVLG 
YILQLNLWNIRCGYVGIDFPGASSEQHYS TY N Stop KFRDKEHAL 
SCPLTPQVTGRGRSLRSGDALVSTCILLITLNTLV Stop F T Stop 1 1 F 
PTSNMetS.LKIDLS CS YWSLLLLASFCFLLLF AHS RLLDMetNSFP 
PRNIKLGHIW WStop NS S VDPCCNNDFACFF V AKAHYFSIRStop G 
LRGRS AQLMet AKKEETHRE AFEFS S CR Stop LRL LIS D S VT LIS PV 
ALA Met SFSIIQEYVLLIVTL Stop S T D T T F Met VTFSEFSSKLVLET 
SEKLAPWSTWDAQTTVGWGNHRGASHRKVSELAL Stop S T F T S G 
Met LKAPVLYSRLAFPLPLEV Met I Stop H L Stop VPAS VSCTFFSRKE 
ALPTITStopSANTAVRPLNISSCSDSSSLPSLNSWTKPKTPSFSH 
CITISES LMetS GSKVQLRIPS SPMetFP AE AT V VT VMetECLPEIPK 
P Met Met SAIAPARI Met Met Met L I Met LQKRICPREA Met AGEGSQRI 
WVLAEWSCPLPSX 

S'S'Framea 

E Met Y I Stop S F S Stop TIRTLKS VFYNN Met Stop YSHRGSKVQEGSG 
Stop H Met KSVGRGLETKYSFPSKLHSSRPQFKQSLSLLSKVCVCF 
Met E G I C L L P Stop FELWPGRVWRSKFRVRKPELELRHFSYRTWL 
AGStopNEGKETStopKLNKLKIIPS GISHRPCQLCSLRDVILVPDR 
VLALCAEAAAKETECLFAND Stop CEKQTDLLAPLFPGPFLLKTR 
VS StopS Stop HSStopHFGLGSCNGETSCTWStopEVEQHPWPLSTSW 
Stop StopHCILLHS LIE VTKIVFRHSPGD GGRNHPS Stop VPLLStop 
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MetLLKTE VL A VH A GWKGB A S EL VMetTStop CHI V K VKGP S LDHS 
R Met N 1 1 S top E I N P H L F V S top K E G A Q F I Y K I G V A D C S V Y A K I Met 1 1 F 
LNFGIC QLGRE AK APLE AS StopK VNHLTH YStopN A A APELRS G S 
Q L A A S Stop GTYYSLICGI Stop DVD Met L G Stop TFQGLPQSNITAHIT 
KNS ETKNMetH YP VLS LPR Stop Q A G G GPStopDLGMetHLS PR AF YL 
SLLTHLSSSHELFS QLLICLStopRStopI YL V HIG AF CF WLLF VS CS 
CLLTPDFStopIStopIHFLPEIStopNStopVISGGEIVLStopIPVVTMetT 
LHAFLWPRHIILASGKG Stop E A E V P S Stop WQRKKRHTEKPLSLA 
A V G D S A F Stop S P I L S L Stop Y P L L L W Q CHFQSYRS Met C C Stop S Stop 
HCRAQTQPS WSHSQSS AQSWYWRLPRS WLP GQLGMetPRPL W A 
GGTIGEPHT AR S LS WH Y S PH $ LP A C Stop R LQ F YT Q G Stop H S P C L 
StopKStop Stop YNICR C QHLStop V ARFS A AKRHC QL S LD QQTLLS G 
L Stop TFHPAPTAHLCLL Stop THGPSLKHLPSAIVSRYQKV Stop C Q 
VQKCSSGFHPPQCS QLRRQStop Stop L Stop WSVSLKYQSQ Stop Stop 
V Q L L Q P E Stop Stop Stop C Stop LC S RR G S A P GKP WLG K VP S V S G Y WL 
SGAARCLXX 
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ORF Finder (Open Reading Frame 
Finder) 



! . ' PubMed : ! ; . Enlrez ; ^:^'^ BLAST ^ : -? t i& 0MIM -.'.'-v4 • Taxonomy I ^^ V Stnjcture ^: 



DEX0043 1 




iGenBank bB BI 100 HtgHF^ 



Frame from to Length 



mm i 



Length: 282 aa 



62 atggcttccctggggcagatcctcttctggagcataattagcatc 

MASLGQILFWSIISI 
107 atcattattctggctggagcaattgcactcatcattggctttggt 

IIILAGAIALIIGFG 
152 atttcagggagacactccatcacagtcactactgtcgcctcagct 

I SGRH S ITVTTVASA 
197 gggaaca 1 1 ggggagga t ggaa t c c t gage t gcac 1 1 1 1 gaacc t 

GNIGEDGILSCTFEP 
242 gacatcaaactttctgatatcgtgatacaatggctgaaggaaggt 

DIKLSDIVIQWLKEG 
287 gttttaggcttggtccatgagttcaaagaaggcaaagatgagctg 

VLGLVHEFKEGKDEL 
332 tcggagcaggatgaaatgttcagaggccggacagcagtgtttgct 

SEQDEMFRGRTAVFA 
377 gatcaagtgatagttggcaatgcctctttgcggctgaaaaacgtg 

DQVIVGNASIiRIiKNV 
422 caactcacagatgctggcacctacaaatgttatatcatcacttct 

QLTDAGTYKCYI ITS 
467 aaaggcaaggggaatgctaaccttgagtataaaactggagccttc 

KGKGNAN LEYKTGAF 
512 agcatgccggaagtgaatgtggactataatgccagctcagagacc 

SMPEVNVDYNASSET 



+2 


H 62.. 910 


849 


-2 


® 1.. 354 


354 


-1 


@ 1835..2134 


300 


+3 


Q 933..1127 


195 


-2 


@1576..1725 


150 


-2 


B 973..1122 


150 


-2 


® 535.. 684 


150 


-3 


B2328..2471 


144 


+2 s 1382..1525 


144 


+1 


@ 1843..1980 


138 


+3 


m 528.. 665 


138 


+1 


H 1633..1767 


135 


+2 


g)1691..1822 


132 


+2 H l 184.. 1291 


108 


+3 


s 1899..2000 


102 


+1 


a 1.. 102 


102 
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557 ttgcggtgtgaggctccccgatggttcccccagcccacagtggtc 

LRCEA PRWFPQP TVV 
602 tgggcatcccaagttgaccagggagccaacttctcggaagtctcc 

WASQVDQGANFSEVS 
647 aataccagctttgagctgaactctgagaatgtgaccatgaaggtt 

NTSFBLN SENVTMKV 
692 gtgtctgtgctctacaatgttacgatcaacaacacatactcctgt 

VSVLYNVTINNTYSC 
737 atgattgaaaatgacattgccaaagcaacaggggatatcaaagtg 

MlENDIAKATGDIKV 
782 acagaatcggagatcaaaaggcggagtcacc tacagctgctaaac 

TESEIKRRSHLQLLN 
827 tcaaaggcttctctgtgtgtctcttctttctttgccatcagctgg 

SKASLCVSSFFAISW 
872 gcacttctgcctctcagcccttacctgatgctaaaataa 910 

ALLPIiSPYLMLK* 
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■% *>ax Aoy - a Aiavui j no .. nttp://ca.expasy.org/nistory.htra 



& ExPASy Home page Site Map Search ExPASy Contact us 

Search j ExPASy web site |§j for [translate 



The ExPASy Molecular Biology Server 

History of changes, improvements and new features 

If you subscribe to our Swiss-Flash service of electronic bulletins, you can receive these and other news by 
electronic mail. 

October 14, 2004 

• Tools . 

Aldente is a tool to identify proteins from peptide mass fingerprinting data. This new, fast and 
powerful PMF tool uses the Hough transform to determine the mass spectrometer deviation, to 
realign the experimental masses and to exclude outliers ( More information) . 

• Mirror site 

We are happy to announce a new ExPASy mirror site in Brazil, http://bT.expasy.org/; hosted by the 1 
Laborat6rio Nacional de Computacao Cientffica, Petropolis 

fc 

June 4, 2004 

• The Melanie page has been restyled. It has been redesigned by the occasion of the announcement of 
SIB, Genebio and Amersham Biosciences joining forces to create one single 2D image analysis. 
Melanie was chosen to be integrated into ImageMaster™ 2D Platinum software. 

April 14, 2004 

• UniProt 

Since the last Swiss-Flash Bulletin, the universal protein resource, UniProt has been released 
publically. Many ExPASy pages andservices have changed to accommodate different aspects of the 
UniProt. knowledgebase and UniRef, trie non-redundant reference databases of UniProt. 

In particular, the ExPASy BLAST interface now allows to launch a sequence similarity search 
against the UniRef clusters UniReflOO, UniRefPO and UniRefSO. " 

Implicit links to UniRefSO and UniRef90 have been added to the NiceProt view of UniProt 
knowledgebase entries. 

• FTP server structure 

As announced in a previous Swiss-Flash bulletin, the structure of the ExPASy ftp server has 
changed. Please refer to mis previous announcement for details. 

• SwUs-Prot/TrEMBL (UniProt knowledgebase) 

A note to Swiss-Prot and TrEMBL users: Please note that we have a long list of planned format 
changes to be introduced in the next few monfihs. 

In the NiceProt view for Swiss-Prot/TrEMBL entries we have added implicit links to the 
Swiss-Model repository of 3D homology models (SMR). 

It is now possible to submit all splice isoforms annotated in one Swiss-Prot entry to a multiple 
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alignment, or to retrieve the . fences of all these isoforms, e.g. from \ ^ 
http:/^'ww.expasY.o^g/cgi-bi^/m^p^ot^ or from 5 

httpyAvww.expasyxrg/cffl-bin/get"Varsplio.pl?I^9590^ . ' ' 

• PROSITE 

PSview Hie view of PROSITE documentation entries contains new functionalities. When a 3D 
structure is described in the text, a direct link to a 3D image of the domain is provided. The 
Swiss-Prot match list of each signature can be visualized as a multiple aUgninent, or as a taxonomic 
distribution graph. For PROSITE profiles, a domain arrangement view is also provided where active •' 
sites and disulfide bridges annotated in Swiss-Prot entries are superimposed on PROSITE domains, 
see the following links for more details: htrn^Avww.expasy.or^cgi-bm/nicedoc.pl7PDOC50119 
http://www.expasy.orgfcffl-bnT^^ 

• ENZYME 

Access to ENZYME entries by class, subclass etc. has been improved. It is now possible to easily 
retrieve all ENZYME or Swiss-Prot entries corresponding to a given ENZYME class. Tnis 
functionality is available from a given ENZYME entry or for a given ENZYME class . 

• The legends for the Biochemical Pathways have been made available in html and pdf format. 

• Tools 

Myristoylator is a new tool designed to predict N-tenninal myristoylation of proteins by neural 
networks. N-tenninal myristoylation is a post-translational modification that causes the addition of a 
myristate group to the r^-terminal glycine of an amino acid chain. 

September 26, 2003 

• Swiss-Prot variant web pages 

Missense mutation leading to single amino acid polymorphism (SAP) is the type of mutation most 
frequently related to human diseases. We have created Swiss-Prot Variant web pages to provide a 
summary of available sequence information as well as additional stmctural information on each 
variant. In particular, wherever possible, SAPs are modeled onto 3D protein structures and the users 

• can visualize the models. The 3D models are updated with each weekly Swiss-Prot release. The 
Swiss-Prot variant pages are accessible from the NiceProt view of a Swiss-Prot entry (e.g. P06737)* . 

' on the ExPASy server, via a hyperlink created for the stable and unique identifier FTId of each ' 
human SAP (e.g. VAR 007908) . 

• Recent and forthcoming changes in Swiss-Prot 

With Swiss-Prot release 41, we have introduced two documents to announce recent and for&coming 
format changes in Swiss-Prot and TrEMBL.Thesfc documents replace the corresponding sections of 
the release notes, and contain detailed information about new keywords, new feature keys and 
comment topics, new cross-references and other format changes. Explicit links to new databases will 
no longer be announced here (i.e. under "What's new on 'ExPASy"), but in the document "Recent 
changes" . 

• Implicit links 

Implicit links (i.e. added on the fry to Swiss-Prot/TrEMBL entries when viewed with NiceProt) to the 
following databases have been added recently: 

• GenAtlas - A human gene database, e.g. P04406 

• HOBACGEN - Homologous bacterial genes database, e.g. P02937 

• HOVERGEN - Homologous vertebrate genes database, e.g. P02304 

• TAjR -. The Arabidopsis Information Resource, e.g. Q38828 

• WorfDB - Tne C.elegans ORPeome cloning project, .e.g. Q17330 
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• WormBase -^^tabase on genetics, genomics and biologv wjf C. elegans, e.g. Q17330 

Information aboutthe criteria for the creation of links to each of these databases can be found in the 
Swiss-Prot document List of databases cross-referenced in Swiss-Prot . 

Whenever reference is made to the Transport Commission (TC) System in Swiss-Prot comments 
lines (CC), a link is added from the NiceProt view to the Transport Protein Database (example: 
P37905). 

• Visualization tool for peptide mass fingerprinting identification results 
We have installed Bioeraph Applet v2.0 , intended for the visualization of results of the Peptldent, 
FindPept and FindMod tools. Links to Biograph are available as part of PepHdent, FindPept and 
FindMod result pages. 

March 21, 2003 

New cross-references have been introduced in Swiss-Prot: 

• Explicit links to GeneDb SPombe, the Schizosaccharomyces pombe GeneDB, example: 
094534 . 

• Implicit links to CleanEx, a gateway to public gene expression data via officially approved 
gene names, example: P02751 . 

There is a new Swiss-Prot document, arath.txt - Index of Arabidopsis thaliana entries and their 
corresponding gene designations. 

An interface to PRATT has been implemented on ExPASy. PRATT is a tool to discover patterns that 
are conserved in a set of protein sequences. The patterns are reported using the PROSITE format . 
The ExPASy BLAST result representation has been modified to allow direct submission of a number 
of sequences to PRATT. 

The ExPASy BIAST interface now allows to perform tblastn searches against individual microbial 
genomes (EMBL genome records, including pl&smids). 

Throughout the ExPASy server, the navigation bar at the top of every page now includes a search 
1 bar, for quick access to Swiss-Prot, TrEMBL, PROSITE, SWISS-2DPAGE, ENZYME, Taxonomy, 
HAMAP and ExPASy site search. 

November 19, 2002 ' 

We are happy to announce a new ExPASy mirror site in Bolivia, http://bo.expasy.org, hosted by the 
Universidad Cat6Hca Boliviana in Cochabamba . 

October 25, 2002 \ 

' ExPASyBar, a very useful navigation bar to the most important databases and tools on the ExPASy 
server, has been developed by Martin Hassman, with input from the ExPASy team. ExPASyBar is an 
add-on to the Mozilla web browser (i.e. it does not work with Netscape, MS Internet Explorer and 
other browsers). Installation" is very simple. ExPASyBar can be configured to use the ExPASy rriirror. 
of Ihe user's choice (in the Ealt/Preferences/Advanced/ExPASyBar menu of Mozilla). 

August 27, 2002 \s 

The last "What's new on ExPASy" is more than a year old, which means that some of the changes 
; and "new" features and services are not all that new anymore..'.. We are trying to list here the most 
important changes, and we will try to report new tools and documents again more frequently in the 
future! 
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= ExPASy = • { -. , 

• We. are happy to announce that since the beginning of the year 2002, there is an ExPASy 
mirror site in the USA, http://u5.expasy.0rg. hosted by the North Carolina Supercomputing . 
Center (NCSC) . Some users may have noticed upon their connection to www.expasy.org, that 
they are redirected to a mirror site that is closer to their geographic location, or that is less 
heavily loaded. If you feel that you are redirected to a mirror site for which the network 
connection is slow, please let us know . 

• News on the FTP server : 

1. PROSITE updated data and documentation files are now made available via FTP even 
between releases, in the directory /databases/prosite/release with updates/ . This data 
always corresponds to the version of the database available for web access via the 
PROSITE page. 

2. Up-to-date plain-text versions of all SWISS-PROT documents can be downloaded by 
ftp, in the directory /datafrases/swiss-prot/apdated doc/ , 

3. Three "special selections" have been added: 

■ merops.seq - all SWISS-PROT entries cross-referenced to the MBROPS database 

■ mitoch.seq - Mitochondrion encoded proteins (entries with "Mitochondrion" on 
OG lines) 

■ plastid.seq - Chloroplast and cyanelle encoded proteins (entries with "Chloroplast" 
or "Cyanelle" on OG lines) 

■= TOOLS = 

Two tools have been added to our collection of sequence analysis and proteomics tools : 

• The Sulfinator predicts tyrosine sulfation sites in protein sequences, based on Hidden Markov 
Models. 

• PeptideCutter predicts potential cleavage sites cleaved by proteases or chemicals in a given 
protein sequence. It displays the query sequence with the possible cleavage sites mapped on it, 
as well as a table of cleavage site positions. 

Major updates have been performed on two tools: 

• The ScanProsite interface has been remodeled, with more options and databases, and a 
graphical view of the results. A standalone program, ps scan is now available. 

• * Ine BLAST interface now allows searches in FDB . The output page displays hits found with 
Pfam HMMs and PROSITE profiles on the query sequence. 

= SWISS-PROT 

• New cross-references have been introduced to various databases: AraC-XylS, Genew, 
Gramene, several 2D-PAGE databases, Ehsembl, GeneLynx, ListiList, ModBase, PhosSite, 
ProtoNet, Source and TTGRFAMs. 

The List of databases cross-referenced in S WISS-PROT contains, for each database, a short 
description, the link type (explicit or implicit), and the server URL. In the case of explicit links, 
. you can click on the word "Explicit" (example: Genew) to retrieve a list of all SWISS-PROT 
entries linked to the corresponding database. 

The cross-references to the following databases have been deleted^ because the databases are 
either no'Ionger available on the WWW, or because they have become commercial even for 
academic users: CarbBank; DOMO, GCRDb, Mendel, YEPD and YPD. 

• The NiceProtview of SWISS-PROT has been further improved: access to documentation has 
been facilitated by adding "mouse-over" hypertext links from various sections in NiceProt to 
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.° " the coiresponding infoiuaation in the user manual. Those hyperteL .inks, which give access to 

documentation rather than the data related to the protein entry, are visually different from the 
ordinary hyperlinks. While they are not immediately recognizable as such, the user can see that* 
they are clickable by moving the mouse pointer over the section headings such as "References" 
or "Keywords" . A short description of the linked information appears at the bottom of the web 
browser, and when clicked, a small additional window is opened with related information - 
extracted from the user manual. 

Similarly, in the "Cross-references" section, the names of the databases to which an entry is 
cross-referenced are linked to the corresponding sections in the document dbxref.txt (List of 
databases cross-referenced in SWISS-PROT). 

• Three SWISS-PROT documents have been released since the last announcement: 

o bucai.txt - Index of Buchnera aphidicola (subsp. Acyrthosiphon pisum) entries 
o mycpn.txt - Index of Mycoplasma pneumoniae strain M129 entries 
o plasmidtxt - List of plasmids 

• The Human proteomics initiative (HPD status report page has been remodeled and now 
contains more detailed information about the status of annotation of human SWISS-PROT 
entries. 

• The HAMAP project aims to annotate send-automatically complete bacterial and archaeal 
proteomes up to the quality standards of SWISS-PROT. Several proteomes have already been 
completed and are continuously updated. Up-to-date statistics are available on the HAMAP 
status page 

• Note that upcoming format changes in the next SWISS-PROT release are always described in 
the release notes for the current release. 

• Although not hosted physically on the ExPASy server, the NEWT Taxonomy browser is 
provided and maintained by members of the SWISS-PROT group, and serves as an entry point 
into SWISS-PROT and TrEMBL using taxohomic search criteria. 

= SWISS-2DPAGE = 

New cross-references, reference maps, and a document have been added: 

• Cross-references to recent fully federated 2-DE databases, built with the Make2ddb package, 
are provided. These are now COMPLUYEAST-2DPAGE , PHCI-2DPAGE, PMMA-2DPAGB ? 

• ' and Siena-2DPAGB . The list of links is updated each time a-SWISS-2DPAGE release is 
completed. 

• SBS-PAGE and 2-D PAGE of nucleolar proteins from Human HeLa cells have been added to 
the list of reference maps . These masters are named respectively 

. NUCLEOLI HELA ID HUMAN and NUCLEOLI HELA 2D HUMAN . 

• A FAQ (Frequently Asked Questions) has been provided. We hope you will find answers to 
most of your questions in this new document 

June 30, 2001 

New cross-references have been added from relevant SWISS-PROT entries to three databases: 

• SMART - Protein domain database (example: 043707) . 

• Lcproma - Database dedicated to the analysis of the genome of the leprosy bacillus 

• Mycobacterium leprae (example: Q9CBW4) . / 

• . MypuList - Mycoplasma pulmonis genome database (example: P58174) . 

The keyword " Complete proteome " has been introduced to all SWISS-PRQT/TrEMBL entries 
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* ' describing a proteirr. ^ich is thought to be expressed by an orgL-jm whose genome has been 
completely sequenced. This keyword is so far only used for microbial (bacterial and archaeal) 

' " proteins. A complete set of proteins from a microbial genome can therefore be obtained using this 
keyword across SWISS-PROT and 1YEMBL. 

A new and improved version of the NiceProt view of SWISS-PROT is available ( example). Some of 
its new features are: 

• It provides a link to a printer-friendly view of a SWISS-PROT entry. 

• It displays the length of certain features in the FT lines. 

• It provides access to a new tool, the ' Feature aligner' which allows to select features for 
submission to the ClustalW multiple alignment program. 

SWISS-PROT release statistics are now available for every update of the database. Among other 
parameters, statistics about database growth, average sequence lengths and amino acid composition, 
taxonomic origin, journal citations and database cross-references are presented, including some 
graphics. 

A new view is available within the SRS Sequence Retrieval System . It displays, for each protein 
corresponding to a user query, gene name(s) and organism (in addition to the parameters ID, AC, 
description and sequence length which are displayed by the default view "Short description"). This 
new view is entitled "Long description " and is available from the menu "Use view ..." in the SRS 
query form. 

The SIB Blast interface (accessible also via "Quick BLAST" or from the bottom of every 

S WISS-PROT/TYEMBL entry) now offers the possibility to restrict the similarity search by using 

taxonomic criteria. A "Taxonomic View" of the results can also be obtained via the BLAST result 

page. 

L'equipe Swiss-Prot a le plaisir de vous presenter le premier article de "ProtemesalaUne", sa 
nouvelle rubrique de vulgarisation scientLfique deMiee aux prolines qui font parler d'elles dans 
ractualite\ 

January 18, 2001 

• SWISS-PROT 

New cross-references have been added to three additional databases: 

• GlvcoSuiteDB - a database of glycari structures; explicit links 
example: P00750 

• GeneCensus - a compilation of ORF data for the Saccharomyces genome; implicit links 
example: 201802 

• HUGE - a database of human unidentified gene-encoded large proteins; implicit links • 
example: P42330 

• NiceProt & SIB BLAST Hie NiceProt view of SWISS-PROT/TrEMBL entries now contains a ■ 
direct submission button requesting a blastp. homology search of the protein against 

SWIS S-PROT/TrEMBL/TrEMBLnew, on the SIB BLAST server ( "Quick BlastP search"). In the - 
results" of SIB BLAST searches on ExPASy (normal or "NiceBIast" output formats), the user can 
select a number of matching sequences and-directly .submit them to a ClustalW search, or retrieve 
. and download the correspondmg SWISS-PROTmEMBL entries. 

• Proteomics tools 

• FindPept : Hiis new tool can-identify peptides that result from unspecific cleavage of proteins 
from their experimental masses, taking into account artefactual chemical modifications, 
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post-translational modifications (PTM) and protease autolytic clea» ~ge. 

• Peptldent : Several new features have been added. ' , ,. , 

o When searching SWISS-PROT, all alternative splice isoforms described in 
SWlSS 7 PROT feature tables are included in the search (e.g. Isoform 12S of 043184) . 

o New organism classes can be searched. For each of the available taxonomic available 
(e.g. Mammalia), anew section (e.g. other Mammalia) has been added, which comprises 
all entries not corresponding to any of the searchable subclasses (e.g. all Mammalia 
except human, bovine, rabbit, and other rodents). 

o Pot each matching protein in a Peptldent result, buttons are available which allow further 
* analysis of the protein by direct submission of the data to FindMod, FindPept, 
GlycoMod, PeptideMass and BioGraph. 

• GlycoMod : Possible oligosaccharide structures suggested by GlycoMod are linked to the 
GlvcoSuiteDB database of glycan structures, if they are reported in this database. The user can 
also select to display compositions reported in GlycoSuiteDB separately from the compositions 
not known in the database. 

October 28, 2000 

Several new features have been implemented on ExPASy during the last few months: 
o The Swiss Center for Scientific Computing (CSCS) and the Swiss Institute of 
Bioinformatics provide a powerful and rapid new BLAST server. A submission form to 
' this server is available from the bottom of each SWISS- PROT/TrEMBL entry on 
ExPASy. Results of blastp similarity searches submitted from this form are now parsed 
and displayed in a more user-friendly way, including a graphical representation and a 
link to NiceBlast NiceBIast is a html table detailing complete descriptions of all 
matching proteins, including the full protein name, gene name, sequence length and 
organism. 

o Sequences of alternatively spliced isoforms of the same protein are documented in the 
feature table of that protein sequence record. In collaboration with the SWISS-PROT 
group at EBI, a program varsplic.pl has been written to generate additional records from 
SWISS-PROT and TrEMBL, one for each splice isoform of each protein. The resulting 
data sets for SWISS-PROT and TrEMBL are available on our ftp server, along with a 
more detailed description of the project and information on how to obtain a local copy of 
the varsplic.pl program. 

. The additional isoform entries have been added to the SWISS- PROT/TrEMBL 
' databases underlying the BLAST server at SIB/CSCS Switzerland, and ScanProsite . 
Gradually, all other tools on ExPASy will be modified to handle splice isoforms. The 
NiceProt view of SWISS-PROT/TrEMBL provides links from the isoform name in the 
feature table (example: Q01432) to a page displaying the sequence of the corresponding 
isoform. 

o In the framework of the HAMAP project, we provide clean non-redundant' 
SWISS-PROT/TrEMBL data sets for all completely sequenced microbial genomes. 
. These files are available on the ExPASv ftp server in SWISS-PROT and Fasta format, 
arid can also be used for similarity searches on the SIB Blast server ("microbial 
proteomes"). 

A^ Genomic Proximity Viewer is available for those microbial genomes where an ORF 
■ numbering system exists. For those organisms, it is possible to click-on the ORF name in 
the SWISS-PROT/TrEMBL GN (gene) lines to obtain a list of proteins encoded by 
genes in proximity (example: P46448). Hie tool is also accessible from the HAMAP 
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■ ' , compfli r T"t»™"> r Q IT q " f nrganisms. Exam ule: Borrelia burgdorferi . 

1 o He following cross-references have been added to relevant SWISS-PROT/TrEMBL 

entries: " -jr.. 

■ InterPro - the Integrated Resource- of Protein Families, Domains and Sites, 
integrating PROSITB, Pfam, PRINTS and ProDom. A link to a graphical view of 
the domain structure is also available; example: 015197 . 

■ MEROPS - a peptidase database; example: 096009 . 

■ NucleaRDB - a database of nuclear receptors (implicit links); example: 00901,8 . 

■ DP - Database of Interacting Proteins (implicit links); example: P10275 

o The Compute pl/Mw tool, if called for" a list of proteins, can now produce, in addition to 
the usual verbose format, a table in text format that can be exported to an external 
application. 

o Protein Spotlight is a periodical electronic review from the SWISS-PROT group. It is 
published on a monthly basis and consists of articles focused on particular proteins of 
interest. You can subscribe to receive each issue, free of charge, in HTML or PDF 
format 

April 26, 2000 

Proteomics Tools: 

• We are happy to announce a new tool in our suite of ExPASy protein identification and 
characterization toois: 

' GlycoMpd is a tool that can predict the possible oligosaccharide structures occurring on 
proteins from their experimentally determined masses. The program can be used for free or 
. derivatized oligosaccharides and for glycopeptides. GrycoMod has been developed in 
collaboration with Nicolle Packer, initially at Macquarie University, Sydney, and later at 
Pfoteome Systems Ltd. GlvcanMass is an associated tool which allows to calculate the mass of 
an oligosaccharide structure from its oligosaccharide composition. 

• Detailed documentation is now available for the PeptJdent" peptide mass fingerprinting 
identification tool. 

• A number of new functionalities have been added to FindMod : 

o Results can now be obtained by email (as an alternative to receiving them on-line in the 
browser window), in form of an html file, with exactly the same functionality as for 
on-line display. * * i 

o Several new enzymes have been added, mainly different versions of Chymotrypsin. 

o Results given in the "potential amino acid substitutions" table have been refined: 

■ We no longer suggest amino acid (aa) substitutions occurring on the enzyme 
cleavage site.and sub s tit utin g the aa for an aa at which the enzyme does not cleave. 

■ If the suggested aa substitution corresponds to a sequence variant or conflict as 
annotated in the SWISS-PROT feature table, this substitution is Mghli^ited in 
color (green background for that table line), and a hypertext link is provided to the 
corresponding annotated variant or conflict, 

• Compute" pl/Mw can now be used with a file uploaded from the user's computer, if this file 
contains a list of SWISS-PROT/TiEMBL IDs/ACs. 

SWISS-PROT: 

• Potlet, a diagonal dot-matrix program drawing a dotplot of two sequences, has been included 
in the set of tools that can be called directly from the bottom of each SWISS-PROT/TrEMBL 
entry on ExPASy. This allows to find repeats .within the sequence. 
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• • in the last few months, cross-references to the following database- jave been added to relevant 

SWISS-PROT entries: 

o TubercuList - for entries from Mycobacterium tuberculosis 
o PRINTS - Protein fingerprint database 

o implicit links to BLOCKS - a database of multiply aligned ungapped segments 
corresponding to the most highly conserved regions of proteins. 
Example entry: Q50705 . 

• There are 6 new SWISS-PROT documents : 

o humchr08.txt : Index of protein sequence entries encoded on human chromosome 8. 
o humchr09.txt :Iadex of protein sequence entries encoded on human chromosome 9. 
o humchrlQ.txt :Index of protein sequence entries encoded on human chromosome 10. 
o humchrll.txt :Index of protein sequence entries encoded on human chromosome 1 1 . 
o dbxref.txt : List of databases cross-referenced in SWISS-PROT. 
o rprowaze.txt : Index of Rickettsia prowazekii strain Madrid E entries. 

ExPASy: 

• We are happy to announce a new ExPASy mirror site, at Peking University, China: 
http://expasy.pku.edu.cn/ . 

• We have completely revised the ExPASy server access statistics, which were previously 
frequently incomplete and erroneous. Every month, a table is updated which lists monthly 
access statistics. for the main Swiss ExPASy server and for all our mirrOr sites. 

October 4, 1999 

• The ExPASy server has a new mirror site for North America, at the Canadian Bioinformatics 
Resource in Halifax, Canada. It can be reached at the URL http://expasy.cbr.nrc.ca/ . 

• The SWISS-PROT search by description tool has been extended to TrEMBL. 

• There are five new SWISS-PROT documents : 

o humchrl2.txt : an index of protein sequence entries encoded on human chromosome 12. 
o humchrl4.txt : an index of protein sequence entries encoded on human chromosome 14. 
o humchrl5.txt : an index of protein sequence entries encoded on human chromosome 15. 
o humchrl6.txt : an index of protein sequence entries encoded on human chromosome 16. 
o annbiooli.txt : SWISS-PROT annotation: how is biochemical information assigned to 
sequence entries V 

• When scanning a pattern against the SWISS-PROT/IrEMBL databases using the ScanProsite 
tool, users can now restrict their searches to an organism or a taxonomic range. 

Ihe NlceSite "view of PROSITE (example: PS00101) has been modified to include two new 
statistical values in its section of numerical results, namely 
Precision (true hits / (true hits + false positives)) and 
Recall (true hits / (true hits + false negatives)). 

• A new parameter has been added to the list of parameters computed by the ProtParam tool: 
The program now calculates the atomic composition of a protein, in addition to molecular- 
weight, theoretical pi, amino acid composition, extinction coefficient, estimated half-life, 
instability index,' aliphatic index and grand average of hydropafhicity (GRAVY). 

June 16, 1999 , . / 

The 'Nice' view tools for the databases provided on ExPASy (SWISS-PROT, SWISS-2DPAGE, 
PROSITE, ENZYME).have been developed in order to provide users with an easily readable 
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alternative to the original text file representation. 
The following tools are available: 

Database Tool Example 



SWIS3-PROT 

SWISS-2DPAGE 

PROSITE 



ENZYME 

We have now changed all our tools and database search programs on ExPASy to display the 'Nice* 
version of a database entry by default Tne programs displaying database entries in their original text 
formats continue to be maintained, and links are available from the 'Nice* views to the corresponding 
get-xxx-entry programs (e.g. get-sprot-entry). 

If you maintain pages with links to entries from the above-mentioned databases, you might be 
interested to update these links to use the *Nice' View if you prefer this representation to the original 
format Otherwise you are, of course, completely free to keep the get-xxx-entry links. 

' May 24, 1999 

• Linking to ExPASy 

We have revised the ExPASy file and directory structure, in order to have the vast amount of 
data that has accumulated on ExPASy since September 1093 available in a more structured 
manner, and to facilitate replication on our mirror sites. This has caused certain changes in 
html links, and we would like to ask our users to update their bookmarks and links 
accordingly. If in doubt, please refer to the document 'How to creat e html links to ExPASy 1 . 
At the same time we wish to reiterate our announcement of the ExPASy mirror sites in Taiwan 
and Australia . For your own convenience, please use the mirror site closest to you. Regular 
users mi^ht also bookmark the addresses of all ExPASy mirror sites to use as backup for the 
^ " rare cases that meir mvourite ExPASy site is down or imreachable due 

Please make sure to update all pointers using the old domain expasy Jicuge.ch, which was 
V replaced by 1 

http://www.expasy.ch/ in March 1997 (I). The 'expasy Jicuge.ch' address might be disabled m 
the near future. 

• Protein identification tools 

AACompIdent and MuMdent have been revised, and the database choice has been extended 
to include TrEMBL. Results are now sent to the user in html format (rather than text only), and 
html links' allow direct access to the matching SWESS-PROT/ TrEMBL entries. 

• SW1SS-PROT cross-references * 

SWISS-PROT entries from Escherichia Coli entries with DR ECOGENF lines are now 
directly linked to EcoGene at the University of Miami. 

There is a new type of cross-reference lines for sequence entries from Brachydanio rerio 
(Zebrafish): these entries are now linked to the Zebrafish Information Network (ZFIN) at the 
/ University of Oregon. 

J • New features have been added to improve interactivity' in accessing SWISS-2DPAGE: 
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"\ o All searching functions in the database can be accessed frou. che top page and results 

page of each keyword search function (example: search by description) , This feature has 
been designed to facilitate the navigation between the different ways to query the ' 
database (by description, by access number, by authors, by full text search). 

o A new tool is provided to retrieve in a table all the protein entries identified on a given 
reference map, with all 2-DE information: spot serial number, pi, Mw, mapping 
procedure, references ( example) , 

o A new way to query the database is provided, from a user-entered amino acids 
sequence, one can display the estimated location on a choosen reference map ( example) . 

February 26, 1999 

• Several new features have been added to the Peptldent peptide mass fingerprinting 
identification tool: 

o It is now possible to search S WISS-PROT and/or TYEMBL. 

o In the page displaying the Peptldent results, a button allows to perform a new search 

with slightly modified parameters by giving access to the Peptldent form filled in with 

all previously used parameters, 
o For each matching protein, a direct link to BioGraph gives access to a graphical 

representation of the results of the Peptldent query. BioGraph was developed by Daniel 

Doubrovldne and Anton Soudovtsev as a student project in the scope of the 

Bioinformatics course given at Geneva University, 
o The sequence portion covered by the matching peptides can optionally be displayed 

and highlighted in colour, as well as the difference between pi and Mw values of the 

matching proteins and the user-specified values. 

• In the results of the SIM binary sequence alignment tool, a direct link has been addded to the 
pRSS program from EMBnet-CH which evaluates the significance of a protein sequence 
similarity score. 

• Direct links have been added from the comments (CQ lines of relevant SWISS-PROT entries 
to the SWISS-PROT documents listing ribosomal protein families (e.g. RL2 ECOLI) , 
aminoacVl-tRNA syntheta ses (e.g. SYC HUMAN) and 7-transmembrane G-linked receptors 
(e.g. AA3R MOUSBV ... 

• Since the introduction of organism classification (OC) terms of the NCBI taxonomy with 
SWISS-PROT release; 37, OS (organism species) lines have been linked to the corresponding 
pages of the NCBI taxonomy browser . 

• The PROSnfe full text search tool has been improved. Like m the SWISS-PROT/TrEMBL full 
text search program, wildcards can be used in query strings and search keywords can be ' 
combined with boolean operators. 

• We have developed Nice2DPage, a tool that provides a user-friendly tabular view of 
SWISS-2DPAGE entries ( example) . The , Nice2DPage View of SWISS-2DPAGE' is accessible 
from the top of each SWISS-2DPAGE entry on ExPASy. 

• New hypertext cross-references have been added to SWISS-2DPAGE entries (e.g. P02997) : " 

o from the 2D ccwnments lines (MAPPING, EXPRESSION LEVEL...), direct links hav.e 
been added to the concerned citation in the SWISS-2DPAGE entry 

o from the 2D lines concerning AMINO ACID COMPOSITION and PEPTIDE MASSES 
data, direct links have been added to the concerned section in the user manual describing 
data format and protocols. 
/ 
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' ' . ( . . * 

• \ • ' ••.Links to the Rrenda enzvme database have been added to ENZYME entries. ■ 

October 27, ^ 9 R 8 SWTgg pRQT/rrEMBL md SWISS-2DPAGB full text search tools have been improved. 
The databases are now indexed using the Glimpse search engine, wildcards can be used in 
query strings, more fields (line types) are indexed and response times are much shorter than 
before. 

• We have developed NiceProt, a tool that provides a user-friendly tabular view of 
SWISS-PROT entries (example). The 'NiceProt View of SWISS-PROT is accessible from the 
bottom of each SWISS-PROT entry on ExPASy. 

• The Mowing database cross-references and literature references have been added to 
SWISS-PROT entries on ExPASy: m f 

o DR links to the PRESAGE resource for structural genomics from Stanford University 

(e.g. P53878); . . 

o DR links from relevant immunoglobin entries to IMGT, the international . 

ImMunoGeneTics database from the University of Montpellier (e.g. P01876); 
o References to the Worm Breeder's Gazette in the RL lines of relevant entries from 

Caenorhabditis elegans (e.g. O09517) . 

. Users who wish to save and retrieve all SWISS-PROT entries originating from a species can 
do this via the SWISS-PROT document ' List of organism identification codes': By clicking on 
any of the species codes (e.g. DROME) and specifying a filename, one can save all 
corresponding entries to a file which can be retrieved from the anonymous ExPASy FTP 
server . 

• The output format of the Peptldent peptide mass fingerprinting identification tool has been 
improved Peptldent results now contain a table summarizing information about the matching 
proteins, from where the user can jump to the detailed listing for the corresponding peptides. 

. The new experimental tool CombSearch provides a unified interface for simultaneous queries . 
to several protein identification programs accessible oh the web. CombSearch was written by 
Remi Hammerli and Pavel Dobrokhotov as a student project in the scope of the BiOinformatics 
course given at Geneva University. 

* • Anew page providing links to conferences and events is available and accessible from the 
ExPASy home page. If you know about any conferences on molecular biology or 
bioinformatics we encourage you to register . 

• The ExPASy interfaces which allow the direct submission of a S WISS-PROT/TrEMBL 
sequence to BLAST servers at EMBnet-CH and NCBI have been modified* provide a more 
transparent selection menu of BLAST programs and databases. These programs are designed 
for smiilarhy searches easily accessible from a SWISS-PROTflYEMBL entry; for advanced 
searches with more options we recommend to use the original BLAST submission forms at 

EMBnet-CH or NCBI . 

■ ' 

August 24, 1998 /* . ^ . ri 

• There is a new tool in our section 'Protein identif ication and characterization tools : • 

' <' Pefltldent allows the identification of proteins using pi, Mw .and peptide mass fingerprinting 
data. Experimentally measured, user-specified peptide masses are compared with the 
theoretical peptides calculated for all proteins in SOTSS^ROT. A species (or group of 
species) can also be specified for the search. Peptldent makes' extensive use of the annotations 
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in SWISS-PROT and takes into account post-translational modifiuitions as documented in 
SWTSS-PROT. 1 . , 

Results are displayed on-line or can be sent by email, in form of a html table. The result file 
contains direct links to FindMod to further characterize matching proteins by predicting 
potential protein post-translational mcfdifications and finding potential single amino acid 
substitutions, and to PeptideMass . 

• There is a new document describing how to create HTML links t o services on ExPASy . 

• In July 1998.' SWISS-PROT, PROSITE and ENZYME have undergone major releases. 

• New hypertext cross-references have been added to SWISS-PROT entries (example: P98073): 

o inRX lines: Medline abstracts corresponding to SWISS-PROT references can now also 
be consulted at the Weizmann Institute of Science in Israel, in addition to the archives at 
NCBI, ExPASy and GenomeNet Japan. These links have also been added to 
• SWISS-2DPAGB, entries. 

o DR DOMO lines have been added: These links provide direct access to relevant 

information in the DOMO database of homologous protein domains maintained by 

Jer8me Gracy at Infobiogen . 
o At the bottom of the page displaying a SWISS-PROT/TrEMBL entry, there are now 

direct links for submission of the sequence to ScanProsite and ProfileScan. 
o RL lines: Relevant SWISS-PROT entries are now directly linked to the Plant dene 

Register, an electronic publication for articles describing the isolation and DNA 

sequence determination of plant genes (example: P48422) . 
o The RyPASv inter face to the BLAST server at EMBnet-CH now uses their new 
. BLAST2 client replacing WU-BLAST. 

.June 13, 1998 

• The ExPASy server presents itself in a new layout: the home page, database entry pages, the 
tools page and many other pages have been redesigned for easier navigation and better 
readability. 

Users can now also use (in addition to the homepage and ExPASy Index) the newly created 
■ 1 clickable ExPASv site map to find useful tools, documents and services available on our • 

server, and to find out about functional links between them. 

A new documentation page has been created which presents a complete table of documents 
available on ExPASy. 

• There are two new SWISS-PROT documents : * 

o humpvar.txt : an index- of human proteins with sequence variants 

o humchrl7.txt : an index of protein sequence entries encoded on human chromosome 17. 

• Protein domains; chains etc. documented in the SWISS-PROT feature tables, if corresponding • 
to subsequences of at least 10 amino acids, can now be directly submitted to a BLAST 
similarity search from the pages m^ghtmg these subsequences. Example: DOMAIN 
EXTRACELLULAR ALPHA- 1 (1A24 HUMAN) ■ 

• ■ Two bugs have been corrected in ExPASy tools : *' ? 

o There was a small' error in the computation of extinction coefficients by ProtParam: The 
contribution of Cysteines to the extinction coefficient (Gill S.C., von Hippel P U. Anal: 
Biochem. 182:319-326(1989)) of a protein is only half of the values used previously in , 
ProtParam, which results m shghtly Afferent values for me extincti^^ • •* 
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o Our Translate tool no longer ignores base-ambiguitv characters such as M, W, Y, etc. 

Previously performed translations for DNA sequences containing characters other than 

A,C,T,U,G, and N are likely to have been incorrect. 
We apologize for any inconvenience caused by these errors and encourage our users to 
continue to send us their comments and bug reports. 

March27, 1998 . . , , , 

• There is a new tool in our section 'Protein identif ication and characterization tools : 
FindMod is a program for the de novo discovery of protein post-trahslational modifications. It 
examines peptide mass fingerprinting results of known proteins for the presence of currently 
18-types ofPTMs of discrete mass . This is done by looking at mass differences between 
experimentally determined peptide masses and theoretical peptide masses calculated from a 
specified protein sequence. If a mass difference corresponds to a known FIM not already 
annotated in SWISS-PROT, "intelligent" rules are applied that examine the sequence of the 
peptide of interest and make predictions as to what amino acid in the peptide is likely to carry 
the modification. 

• Improved tools: . . 

PeptideMass, which calculates masses of peptides and their posttranslational modifications for 
a given protein sequence, can now consider up to 3 missed cleavages. Post-translational 
modifications may be specified for a sequence in raw sequence format, and substitution tables, 
are available to simplify the interpretation of the results for peptides concerned by database 
conflicts, variants or splicing variants. 

• ' fagldent can now search in SWISS-PROT, TrEMBL or both databases. It is also possible to 
perform an additional scan of a short sequence tag against all fragments contained m the 
database(s), even if pi and Mw cannot be computed for these proteins. 

Multildent (identification using pi, Mw", amino acid composition, sequence tag and peptide 
mass fingerprinting data) is available for constellation 2 (Ala, He, Pro, Val, Arg, Leu, Ser, Asx, 
Lys, Thr, Glx, Gfy, Met, His, Phe and Tyr. (Asp+Asn=Asx; GIn+Glu=Glx; Cys and Trp are 
not considered)) and constellation 4 (like constellation 2, but Gly is not considered). 

• Several months ago, we started to distribute and update weekly, a set of data files that can be 
used to build a non-redundant protein sequence dat abase consisting of S WlSS-PROT, 
TrEMBL and TrEMBL updates . There is nnw a document plaining the contents .and 
principles of this database . 

• Information about the current release and update status of SWISS-PROT has been added to the 
SWISS-PROT page (currently *ReIease 35 and updates up to 20-Mar-1998: 71198 entries 1 ). 

• New hypertext cross-references Jiave been added to SWISS-PROT entries: 

o ihRX lines: Medline abstracts corresponding to SWISS-PROT references can now also 
be consulted on the Japanese GenomeNet server in addition to the archives at NCBI and 
BxPASy. . 
o in DR PDB lines: Local copies of POT entries are available. The user is now given the 
choice between accessing 3D structure information (e.g. 2hhe) in Geneva or Brookhaven 
. * ' OJS) . Both links provide direct access to 3D structure information in various formats, as - 

well as hypertext links to servers offering related information. 
■ o DR PROTQMAP lines have been added: These links provide, for a SWISS-PROT entry, 

a cluster (group) of related proteins as classified by the ProtoMap server at Hebrew 
University, Jerusalem. 
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\ . ' Example: DEFN HUMAN . 

. SWISS-2DPAGE is now available to be searched by fee SRS Sequence Retrieval System. 

February 1 > 3 ' ^^_ ffROT ^ fext gcarch tool has been redesigned and improved. Boolean operators 
(AND, OR, NOT) can be used to combine and restrict queries, and special characters such as 
-#'(),./ are allowed as part of words (as used in SWISS-PROT). 

. SWISS-PROT author names (RA lines) have been linked to a page listing all SWISS-PROT 
entries which contain references to articles (co-) authored by this author, 

* . Th. KvPAgy interface to the EMBNet-CH BLAST server now contains anew option: This 
BLAST process manages two job queues: a (presumably) fast one and a slow 6ne^ Based [ on 
the sequence provided and the database requested, the process makes an (educated ???) guess 
to decide if the query will require more than 5 minutes of CPU time. Small jobs are aUowed to 
proceed in the fast queue, while the others are forced to the slower one If an e-mai address is 
provided, results of slow jobs will be automatically mailed back, while fast jobs will proceed, 
as before. 

• Two features have been added in SWISS-2DPAGB to facilitate visualisation and 
differentiation of spots: . , 

o If you click on a spot in one of the SWTSS-2DPAGE maps (e.g. Plasma), the 7D line 
describingthisspotinthe corresponding S WISS-2DPAGE entry is mghhgfrted m green. 

o Hypertext links have been added from spot serial numbers on SWISS-2DPAGE 2D 
lines to.the master image for the protein, in which the spot with this serial number is 
highlighted in green (in contrast to the other spots displayed in red). Example: P00450. • 

January 13, g» ^ SWISS^ROT, PROSITO, ENZY^ and SWISS-2DPAGE have all 

gone through maj or releases. 
. There is a new program that allows you to randomly retrieve a SWISS-PROT or TrEMBL 
entry . 

• A new output format option has been added to our Translate tool. When translatkg a \ 
nucleotide sequence into* protein sequence, you can now also select to include, for each of the 
six open reading frames, the nucleotide sequence in the output. 

' • Cross-references and direct links to the Mendel Plant Gene Nomenclature Database have been 
added in corresponding SWISS-PROT entries. Example: P12084- ^re aho is a fife > 
containing all SWISS-PROT entries with cross-references to Mendel in our series o ^special 

. selections", which is updated weekly and can be downloaded from our anonymous FTP server. 

• Proteins which are documented to belong to an uncharacterized protein family in the 
SWISS-PROT CC (comments) lines, have been linked to the SWISS-PROT document 
upflisttxt . Example: P55061 . 

* r 

November 27, 1997- ' m , _ . . 1 

. In SRS (Sequence Retrieval System), SWISS-PROT DR (Database crossReference) and RC , 
(Reference Comment) lines have been indexed. You may search for e.g. ail entries with cross , 
references to PDB (enter TDB' in the DbName field), or all proteins that have been found in j 
Rcoli strain K12 (enter *K12* in the 'RefCommenr* field). 
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' ' • It is now posL.ie to retrieve annmber of SWISS-PRO^ „j3MBL entries by specifying a list 

ofaccession numbers or entry names (ED). 

• There are 4 new SWISS-PROT documents : 

• humchrl8.txt : an index of protein sequence entries encoded on human chromosome 18; 

• pcc6803.txt : an index of Synechocystis strain PCC 6803 entries; 

• deleteac.txt : an index of deleted accession numbers. 

• upflisttxt : UPF (Uncharacterized Protein Families) list and index of members. 

October 7, 1997 

• We have implemented a search index, ExPASy Index, to help you find information within the 
ExPASy server. The index contains all the documents of ExPASy (currently about 800), 
except the database entries. It has been automatically indexed by the Marvin robot. 

Our new service BioHunt uses the same concept and allows you to search the internet for 
molecular biology information. In the current version, 17136 documents have been indexed. 

• The ScanProsite tool has been modified to work- with TrEMBL as well as SWISS-PROT. 
Furthermore, the part of the program which allows to scan a pattern against SWISS-PROT 
(and TrEMBL) has been improved and now avoids the previously frequent 'Document contains 
no data* error for large scan results. 

• In PeptideMass, the set of post-translational modifications with discrete mass differences 
considered in peptide mass computation now also contains O-GIcNac (documented as ft 
carbohyd glcnac in SWISS-PROT) and C-Mannosylation of Tryptophan ( ft carbohyd 
c-mannosyl). Thus, 17 post-translational modifications are now considered in PeptideMass. 

■ For examples, try CRAA BOVIN or RNKD HUMAN, dont forget to select "display all 
known post-translational modifications" and click on the "Perform"' button. 

• Hiere is a new SWISS-PROT document : 

mgdtosp.txt - Index of MGD entries referenced in SWISS-PROT. 

• Hyperlinks have been added from SWISS-PROT entries to the TIGR Microbial Database, 
which provides links to the information provided by TIGR on the genes encoded in the 
genomes they have sequenced (so far these are: Haemophilus influenzae, Helicobacter pylori, 
Methanococcus jannaschii, and Mycoplasma genitalium). (Example: FDHB METJA) 

We have also created a specific file containing all SWISS-PROT entries containing 
cross-references to the HGR database in our series of "special selections", which is updated 
. weekly. 

• SWISS-PROT reference (RL) lines and PROSlTE references referring to one of the journals 
available at IDEAL, an online electronic library containing all 175 Academic Press journals, 
now contain direct links to the IDEAL server if the article was published in 1996 or later. From 
this,* a 'Guest login* leads to the abstract of the article. (Example: RGSE RAT) 

September 5, 1997 

Some new features of ExPASy: 

• The PeptideMass program has been modified to take into account up to 2' missed cleavage 
sites. A new column TVTC' has been added to the output which indicates the number of missed 

. ' cleavages, and peptides resulting from 0, 1 or 2 missed cleavages are displayed in different 
colours. 

• A new parameter has been added in the PrbtParam program: ProtParam results now include the 
grand average of hydropathicity (GRAVY) for a given protein, 
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: ■•' . AtmebottomofeachSWISS-PROTandlr^ 

displaying the entry in FASTA format (example: PI 1553). 
. Th, w.*1 .nbmissinti form to the Wl^T.AST server at Lausanne has been changed to use as 
the default database the set of non r redundant protein databases SWISS-PROT, TrEMBL -and 
TrEMBL_KEW. 

• There are two new SWISS-PROT documents : 

- metallo.txt - Classification of metallothioneins and index of MT entries 

- hnvlori.txt - Index of Helicobacter pylori strain 26695 chromosomal entries 

• Hie display of current and previous Swiss-Flash bulletins has been redesigned: A table is 
available which lists all Swiss-Flash bulletins by category, including date, title and author of 
the bulletins. . 

July 24 1997 

We have now an SRS server (version 5) running on ExPASy. SRS (Sequence Retrieval System) 
allows you to retrieve entries across multiple databases with more sophisticated criteria than those 
allowed by the text-search interfaces available from the SWISS-PROT top page. 

You can combine all the fields with logical operators and achieve queries like: 

• Give me all vertebrate proteins having a PH domain and that are longer than 1000AA or 

• Give me all calcium-binding proteins localized in the endoplasmic reticulum. 

Five databases are indexed: SWISS-PROT, TrEMBL, TrEMBLJNEW, PROSITC, and ENZYME . 
SWISS-PROT and TrEMBL are updated on a weekly basis so that the set of these two databases 
stays non-redundant. 

TrEMBL entries are now fully accessible on ExPASy via a cgi-script The hypertext version of 
TrEMBL contains links to various databases and allows direct access to sequence-analysis tools such 
as Swiss-Model, Blast, ProtiParam, ProtScaje, Compute pIZMw and PeptideMa ss, as is the case for 
SWISS-PROT. 

If you wish to link to a TrEMBL entry, you can use the following URL: 

http: //www. expasy.ch/cgi-bin/get-sprot-entry?<TxEMBL-AC> 

e.g. to create a link to TrEMBL entry Q00061, use: . 
http : / /www . expasy . ch/cqi-bWaet-sprot- entry?Q00061 

June 6, 1997 . , 

We are actively seeking any type of updates and/or corrections of SWISS-PROT entries, whether 
they have been published or not, and we encourage our users to submit us their suggested updates or 
corrections. This can be done using our new submission form, which can be accessed through an 
active link from the SWISS-PROT home page or from the bottom of each SWISS-PROT entry. 
Please read the tips and guidelines to find out what type of information we are seeking and how to 
proceed. We would already like to thank our users in advance for any contribution they can make m 
updating and correcting SWISSrPROT! 

The tool which allows yon to visualize and highlight the subsequence corresponding to a line -in' a 
SWISS-PROT feature table (FT) has been improved and is now using colour to highlight the- / 
subsequences in question. Example: in FA9 HUMAN: 

FT DOMAIN 93 129 EGF-LIKE 1, CALCIUM-BINDING (POTENTIAL) . 
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'.May 21, 1997 

At the bottom of each page displaying a SWISS-PROT entry, you will now find a link to a graphical 
Feature Table viewer (Java Applet) written by Thomas Junier at the Bioinformatic s Group of ISREC 
Lausanne . 

We have added several new hyperlinks in SWISS-PROT entries: 

• The DR lines containing cross-references to EMBL/GenBank/DDBJ now include a link to a 
page displaying exclusively the corresponding CoDing Sequence (CDS). 

• TheRL lines referring to recent articles in certain journals whose WWW servers are 
maintained in collaboration with HighWire Press are now active hyperlinks to the abstracts of 
the corresponding articles. From the abstract page you can frequently access directly a fall text 
011-line version of the article. The journals include J. Biol. Chem., Proc. Natl. Acad. Soi. USA, 
Science, Cell, etc. 

• Entries with cross-references to MEM are now also linked (through a new virtual " dr 
Genecards". line) to GeneCards, a database integrating information about the functions of 
human genes and their products, and of biomedical applications based on this knowledge. 
Example: BRG1 HUMAN . 

• Entries belonging to family 1 of G-protein coupled receptors (as documented in feature tables) 
now contain active links to GPCRDB-Snakes diagrams (through the new virtual " dr 
GPCRDB-snakes" line) prepared by the GPCRDB group at EMBL Heidelberg. 

Example: 5H1A HUMAN . 

There are 3 new SWISS-PROT documents : 

• humchrl9.txt : an index of protein sequence entries encoded on human chromosome 19 

• ngr234.txt : a table of putative genes in Rhizobium plasmid pNGR234a 

• iiutfacttxt: a list of translation initiation factors 

On the ExPASy anonymous FTP server, the SWISS-PROT update files new_seq.dat, upd_ann.dat 
and upd_seq.dat are now also available in compressed form in the directory 
/ftp/databases/swiss-prot/updates compressed/ . 

March 27, 1997 

We have modified and improved access from ExPASy to various BLAST (Basic Local Alignment 
Search Tool) similarity search services: 

In the tools page, you can now choose between 5 different interfaces to BLAST servers in 
Switzerland, the USA and Germany: 

Switzerland: 

Running oh a 2-processor Pentium Pro machine, the new WU-BLAST server at EMBNet 
Switzerland in Lausanne has a faster response time than the EPFL server, and should be more 
stable. As opposed to the original NCBI BLAST algorithm, WU-BLAST generates gapped 
alignments. A full set of weekly updated databases is provided. 

• Local interface to WU-BLAST at EMBNet-CH (Lausanne) 

• Original interface to WU-BLAST at EMBNet-CH (Lausanne) 
USA: 

• Local interface to BLAST at NCBI 

• Original interface of BLAST at NCBI ' 
Germany: 1 

• WU-BLAST at Berk's group in EMBL (Heidelberg) 

For direct BLAST submission from a SWISS-PROT entry (icons at the bottom of the page displaying 
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. ' an entry - example) , you have the choice between the servers at NCBI and EMBKet-CH. 

* • 

The following documents have been added to the list of SWISS-PROT documents : 

• bl6odgrp.txt - Blood group antigen proteins 

• fly-txt - Index of Drosophila entries and their corresponding FlyBase cross-references 

• mjannasc.txt - Index of Methanococcus jannaschii entries 

• mgenital.txt - Index of Mycoplasma genitalium strain G-37 Chromosomal entries 

March 17, 1997 

We have completely rewritten the Swiss-Shop sequence alerting system for SWISS-PROT that ' 
allows you to automatically obtain (by email) new sequence entries relevant to your fleld(s) of 
interest. 

In the new version of Swiss-Shop, some new features have been added: 

As before, you can either launch a sequencefcattern based search or a keyword based search. 

• For a sequence based search, you need to specify a SWISS-PROT ID or AC or a raw protein 
sequence, and your sequence will be scanned, at each weekly update of SWISS-PROT, against 
the new sequences in the database using the alignment program BLAST. Sequences thus found 
to be similar to your protein will be sent to you by email. It is up to you to specify the BLAST 
probability threshold for P(N) (the probability that the alignment is real and not random), and 
you will receive a list of all sequences for which this probability is below the specified value. 

• • For a pattern based search, enter a PROSITE ID or AC or a pattern in PROSITE format, and 
Swiss-Shop will scan this pattern, at each weekly update of SWISS-PROT, against the 
sequences that have been added in SWISS-PROT since the last weekly update. You will 
receive the list of new entries matching your pattern. 

• For a keyword based search, it was previously possible to specify keywords from 
SWISS-PROT OS, OC, OG (taxonomy), RA (authors), KW, DE, CC lines. In addition to these 
lines, you can now also search DR (Cross-references to other databases) and FT (feature) lines 
with one or more specified keywords. Swiss-Shop will look for these keywords on the • 
corresponding lines of all SWISS-PROT entries added in the database since the last weekly 
release. * 

Furthermore, we now offer you 4 different output formats. You can choose to receive the sequences 
matching your query 

• as a file in SWISS-PROT format or 

• as a list of^SWISS-PROT accession numbers or , 

; • in form of a short report containing information from SWISS-PROT ID, AC, DE, OS lines or 

• as a list of SWISS-PROT accession numbers with hypertext links to the corresponding entries 
on the ExPASy WWW server. This .allows you to view your email message with your Web 
browser and to follow the hypertext links to the iull entries on ExPASy. 

You can further specify if you wish to be notified every time Swiss-Shop is run, even if there are no 
new sequences matching your query, or to receive an email report only when there are new 
SWISS-PROT entries matching your search terms. ■ 
■ You can specify the expiration date of your request, the default being one year after submission. ' 
For editing previous requests (e.g. to update the expiration date or to modify search.criteria) you can 
enter a password for each new request. This allows you to open the request later and edit it on-line 
rameru\andeletmgitandsubrnittinganewone. ' * 

. March 6, 1997 
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New and improved uiQtein identification tools : 
There is anew tool oh ExPASy: 

' • Multildent : This tool achieves protein identification using parameters such as protein species, 
estimated pi and MW, AA composition, sequence tag, and peptide mass fingerprinting data. It 
is particularly suited to the identification of proteins across species boundaries. Currently, the 
program works by first generating a set of proteins in the database with AA compositions close 
to the unknown protein, as for AACompIdent Theoretical peptide masses from the proteins in 
this set are then matched with the peptide masses of the unknown protein to find the number of 
peptides in common (number of "hits"). Three types of lists are produced in the results. Firstly, 
a list where proteins from the database are ranked according to their AA composition score; 
secondly, a list where proteins are ranked according to the number of peptide hits they showed 
with the unknown protein; and thirdly, a list that shows only proteins that were present in both 
the above lists, where these proteins are ranked according to an integrated AA and peptide hit 
score. In all these lists, protein pi, MW, and species of origin' (using a term from 
SWISS-PROT OS or OC lines) and keywords can be used, as in AACompIdent, to increase the 
specificity of searches. 

The following tools have been improved, offering numerous additional features: 

• AACompIdent (identification of a protein from its amino acid composition) 

You can restrict your search by specifying one or more term(s) from the OS or OC lines of 
SWISS-PROT (example: HOMO SAPIENS or MAMMALIA). You can also enter a keyword 
appearing on the KW lines of SWISS-PROT to further restrict your search. For example, a 
keyword of "CALCIUM-BINDING" could be used in conjunction with the OC term 
"MAMMALIA" to see if a user- entered protein matches well with any mammalian 
calcium-binding proteins in the database. 

• Tagldent now allows, for one or more species (term from SWISS-PROT OS or OC lines) and 
with an optional keyword, 

1. the generation of a list of proteins close to a given pi and Mw, 

2. the identification of proteins by matching a short sequence tag of up to 6 amino acids against 
proteins in the SWISS-PROT database close to a given pi and Mw, 

3. the identification of proteins by their mass, if this mass has been determined by mass 
spectrometric techniques. 

For PeptideMass, Compute pI/Mw, AACompSim and all the above-mentioned tools, 
documentation and references have been added and the submission forms have been 
reformatted and improved. 



i 4, 1997 . 
Thanks to the generosity of the Geneva Government, \ye have been able to acquire a new computer 
for the ExPASy server (a Sun Microsystems Ultra Server Enterprise 2). The server is now accessible 



March 4, 1997 
Thanks 
for the] 
at URL 



http yAvww.expasy.ch 

The old URL remains valid for some time. 

January 9,- 1997 /- ' 
Some new features of ExPASy: ' ' 

• New active links have been established from SWfSS-PROT entries 
o to the TRANSFAC database of transcription factors; 

o from Bacillus subtilis entries to Micado (Microbial Advanced Database Organization) at 
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IKRA, France; , " , . r _ . 

o to local copies of MEDLINE abstracts. We how give Hie user the choice of retrieving a , 

MEDLINE abstract (example: 90368558) from either NCBI or Geneva; 
o to our Peptide Mass tool which cuts a protein sequence with a chosen enzyme and 

computes the masses of the received peptides. 

• From Release 35 on, SWISS-PROT comments (CC) lines can contain a new topic* 
"DATABASE" which contains information about related databases catering for a specific 
protein or a for a very limited number of proteins. Most of these databases are mutation 
databases, reporting defects linked to a genetic disease. If such a database is available' 
electronically, the CC DATABASE lines provide the relevant electronic coordinates, e.g. in 
P29965 (CD4LJEIUMAN): 

• CC -1- DATABASE: NAME«CD40Lbase; NOTE=European CD40L defect database; 
CC WWW° " HTTP : / /www . expasy . ch/cd401base/ " ; 

CC FTP- " ftp . expa s v . ch/ da t aba s e 3 / cd4 Olbas e " . 

• There is anew SWISS-P&OT document : 
yeastl3.txt - a list of Yeast Chromosome Xm entries. 

• Two new features have been added in ENZYME entries: 

o direct links from an enzyme to all relevant maps of Boehringer Mannheim's Biochemical , 
Pathways and 

o links to the WIT (What Is There) database of metabolic pathways. 



November 26, 1996 , , t a „a~~**«» 

The Boehringer Mannheim Biochemical Pathways maps and index have been digitised and are now 
accesible on this server. Enter a keyword (such as, for example Oxoacyl) and surf on the biochemical 
pathways maps. 

November 1U996^ £ ^ ^ Defect Database prepared by Manuel Peitsch, has been made 
accessible through this server. The purpose of CD40Lbase is to collect clinical and molecular data on 
CD40 ligand defects leading to X-linked Hyper-IgM syndrome. 

A new tool is available from the Tools page : The PeptideMass Peptide Characterisation Software. 
This program is designed to calculate the theoretical masses of peptides generated by the chemical or 
enzymatic cleavage of proteins, to assist in the interpretation of peptide mass fingerprinting and 
peptide mapping experiments. Protein sequences can be provided by the user or can be a co^e name 
for a protein in the SWISS-PROT protein database. When proteins of interest are specified from 
S WiSS-PROT, the program considers all annotations for that protein in the database, and uses these 
in order to generate the correct peptide masses and warn users about peptides that are not likely to be 
found when undertaking peptide mass fingerprinting. Many protein post-translational modifications 
which affect the masses of peptides can thus be taken into consideration. 

In PRQSITB and Enzyme, we have added the possibility to save all referenced SWISS-PROT entries 
tn a file on our anonymous FTP server (in the outgoing directory). 

The Compute pI/Mw tool has been included in the list of sequence analysis tools that can be directly 
accessed from a SWISS-PROT entry. 

Two new SWISS-PROT documents are available: - 
- humchr20.txt - an index of protein sequence entries encoded on human chromosome 20 / 
*- tisslisttxt -alist of the currently valid values for the "TISSUE" topic of the RC line type in / 
SWISS-PROT. 
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* September 30, 1996 " ^ . 

A new SWISS-PROT document has been added: ribosomp.txt - an index of nbosomal proteins 
classified by families on the basis of sequence similarities. 

In ec2dtosp.txt, an index of B. coli Gene-protein database (EC02DBASE) entries referenced in 
SWISS-PROT, we have established direct links to ECQ2DBASB, and SWISS-PROT entries now 
also contain links to EC02DBASB. 

At the end of each page displaying a SWISS-PROT entry we have added links to our sequence 
analysis tools ProtParam and ProtScale. which allows the user to directly submit the SWISS-PROT 
sequence to these tools. 

September i9, 1996 

Some new features of ExPASy: _ ^ . .' ,. P+ , 

. We have created a new protein identification tool called Tagjdent . This is a modification of the 
old tool GuessProt. The user can now identify proteins from 2-D gels by giving protein pi and 
MW estimates, a species or organism classification of interest, and a short sequence tag of up 
to 6 amino acids. This tag can be derived from the N-terminus, the G-terminus or from internal 
peptides of a protein. The results are now sent to the user by e-mail, allowing many searches to 
be done at the same time. If you only want to generate a list of potential proteins m a specific 
pi or MW range (as was the function of the old tool GuessProt), do not select the TAG option 
in the form. 

• An email option has been added to the tool ScanProsite : if you want to scan a pattern against 
SWISS-PROT, you have now the option of having sent the results of your query by email, 
Which should avoid previously frequent timeout problems and is particularly useful for 
complex patterns. m 

ScanProsite, which only scans SWISS-PROT with PROSHB pattern entries (as opposed to 
rule and matrix entries), can now also be used with the PROSITE rule entry PS00013, 
PROKARJJPOPROTEIN. 

• SWISS-PROT entries have been linked to PDBJ, the DNA Data Bank of Japan. We have also 
added direct links to the Bacillus subtilis genomic data bank, SubtiList and to the Yeast Protein, 

' Database YPD to relevant SWISS-PROT entries. 

• Links have been established from most feature (FT) lines of SWISS-PROT entries to pages . 
that highlight the subsequence in question, both in 1- and in 3 -letter amino acid codes. 
Example: in FA9 HUMAN : 

- FT * DOMAIN 93 129 6GF-LIKE 1, CALCIUM-BINDING (POTENTIAL). 

• We have added three new SWISS-PROT documents : 

humchrx-txt - an index of protein sequence entries encoded on human chromosome X 
yeast7.txt - a list of Yeast Chromosome VII entries 
veastl4.txt - a list of Yeast Chromosome XIV entries. 

• 2D Hunt, a database created and continuously updated by the Marvin robot contains sites 
related to electrophoresis and specifically to 2-D electrophoresis. It is now searchable from me 
SWISS-2DPAGE top page. 

April 11, 1996 
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l" 1 
. ■ ' AACompIdent : New options -AACompIdent is a tool which allows the identification of a protein 
from its amino acid composition. It searches SWISS-PROT for proteins, whose amino acid ' k 
compositions are closest to the ammo acid composition given. Two new options and a new 
constellation have been added to this tool: 

A. C-Terminal display in tagging option 

- The user may now choose between displaying the C or N terminal side of the proteins that score best 

B. Permutation search in tagging option 

This option searches for all permutations of the given tag in the sequences. • 

C. Constellation 4 

Constellation 4 has been added: Ala, lie, Pro, Val, Arg, Leu, Ser, Asx, Lys, Thr, Glx, Met, His, Phe 
and Tyr. (Asp+Asn=Asx; Gin+Glu-GIx; Gly, Cys and Trp are not considered). 

March 22, 1996 

We have added a new tool, ProtScale which allows you to compute and represent the profile 
produced by an amino acid scale on a selected protein. 50 scales are provided, including 'classics' • 
such as the Kyte and Doolittle hydrophobicity scale. 

Links have been added between relevant SWISS-PROT entries and the 2D gel protein databases at 
Harefieid . 

A new SWISS-PROT document has been added which describes the nomenclature of glycosyl 
hydrolases (GH) and that includes an index of sequences that belong to the various GH families. 

A PC (MS-Windows) version of LALNVIEW (graphical viewer for pairwise alignments) is now 
available . 

Nicolas Guex has produced a new logo for PROSITE. 
February 16, 1996 

We have added a hew tool, SIM which computes a user defined number of best non-*Mtersectmg 
alignments between two sequences. The results of the alignment can be viewed graphically using the - 
LALNVIEW program developed by Laurent Duret and which is currently available for Macs and 

UNIX. ; i 

\ 

Additional links have been'added in the tools page, notably to the Weizmann Institute ultra-fast 
rigorous (Smith/Waterman) similarity searches using the Bioccelerator and to the Gamier, 
Osgoodrhorpe and Robson ( GOR) secondary structure prediction method at SBDS. 

The SeqAnalRef database now includes a section listing author's email and eventually also WWW 
home pages. It is'also possible to access the links from a page displaying either a reference list or a 
single reference. 

Amos has recently started to create a list of Biomolepular servers for his own usage," but as some 
people have asked to access this list (which is under construction), we are making it available from 
' the ExPASy top page. Many other small changes were carried out in the last two months. 
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We thank you for ukig ExPASy (we have now reached a cumulative total of 4 million connections). 

' December 14, 1995 ' . , t , .. A 

After 29 months . of existence the ExPASy molecular biology server received a new logo, designed 

and produced by Nicolas Guex . 

October 23, 1995 , rt , , , „ ^ 

The Melanie page has been reorganised. With the announcement of release 2.1 of the Melanie H 2-D 
PAGE analysis software package, a complete up-to-date description of the software as well as a 
■ comprehensive tutorial are now available. 

0Ct ° b Links have been added between SWISS-PROT Escherichia coli K12 chromosomal entries and the 
EcoCyc database, the encyclopedia of E. coli Gene and Metabolism. 

You can now seach in PROSUE by citation . 

October 9, 1995 

Some new features of ExPASy: j . at* 

• Search in SWISS-PROT by citation - When you call this option, you are prompted to enter the 
name of a journal and optionally a volume number arid/or a year. The program is written in 
such a way that you can' enter either the full name of a journal or its official abbreviation. 

• RandSeq - a new tool to generate random protein sequences. 

. SWISS-PROT document haeinflu.txt - Index of Haemophilus influenzae RD chromosomal 

entries and gene names with links to the TIGR and EMBL servers. 
. SWISS-PROT document submittxt - Description of how to submit sequence data to the 

SWISS-PROT data bank. 

• SWISS-PROT document aatmasy.txt - List of aminoacyl tRNA synthetases. 

• Swiss-Jokes - A new page to give access to our collection of jbkes from the fields of molecular 
biology and of computing. 

Many other changes have been done, such as the redesign of the Geneva local pages; the addition, in 
the tool page, of a link to ProflleScan . 

It should also be noted that when you search in SWISS-PROT by either description or by ML£xt 
and that your seach criteria returns more than two entries, you can save these entries to a file on our 
anonymous FTP server Tin the outgoing directory). 

September 19, 1995 
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• AACompIdent : New options - AACompIdent is a tool which allows tht ^entification of a protein 
from its amino acid composition. It searches SWISS-PROT for proteins, whose amino acid > 
compositions are closest to the amino acid composition given. A new option and a new constellation 1 
have been added to this tool: 

A. Tagging option 

With mis option, the first 40 amino acid of each protein are printed in the result, instead of the 
protein name. One may optionnally also enter a tag (a short seuqnece, typically 3 to 8 residues) 
which will be matched with the sequences of the resulting proteins. Proteins matching the tag will be 
marked. 

B. Free constellation 

This is a free constellation, that is one may select any amino acid constellation he/she likes. One just 
have to fill in the composition values for the selected amino acids. The values will then be 
normalised, so that me total make 100 (percent). 

September^ 1995 

" A new page has been created: WORLD-2DPAGE is an index to all known federated 2-D PAGE 
database servers, as well as to 2-D PAGE related servers and services. 

July22,19&5 

A new tool has been implemented on ExPASy, ProtParam allows the computation of various 
physical and chemical parameters for a given protein stored in SWISS-PROT or for a user entered 
sequence. The computed parameters include the molecular weight theoretical pi, amino acid 
composition, extinction coefficient, estimated half-life, instability index and aliphatic index 

Tne Journal of Biological Chemistry (JBCJ has a WWW server where abstracts and full text of 
articles are made available. We are happy to announce the implementation of what we believe to be 
the first direct link in a sequence database between a reference and the full text version of a cited 
article. Recent JBC references are directly linked to the corresponding entry point in the JBC Server. 
If you want to see such a link, take a look at reference 3 in SWISS-PROT entry KDSA.ECOU. 

. The SWISS-PROT document file iourlisttxt which provides information on all the journals cited in 
mat database, now contains links to WWW or Gopher servers set up by a variety of publishers of . 
academic journals. 

Two new, SWISS-PROT document have been added, one is a nomenclature and index of peptidase 
sequences, the other-is the list of Yeast Chromosome VI entries in SWISS-PROT 

June 19, 1995 

A new tool has been implemented on ExPASy, ScanProsite allows to either scan a protein sequence 
the occurence of patterns stored in the PROSITE database or to scan the SWISS-PROT database - 
including weekly releases - for the occurence of a pattern. f 

We are happy to announce a new ""service"" Swiss-Quiz The principle of this. =quiz is to answer to 10 
randomly'chosen questions relative to the fields of molecular biology, biochemistry and genetics. 
Each month, we will randomly pick up one person among all those that haVe obtained a perfect score 
(and it's not so easy !) and will send that person some delicious Swiss chocolate ! 

Links have been added from SWISS-PROT to the Saccharomyces genomic database (SacchDb) at 
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Stanford. f % ■ 

i 

*A new SWISS-PROT document has been added, it is a nomenclature and index of allergen 
sequences. 

May 26, 1995 

A new service is available: SWISS-2DSERVICE . The Two-Dimensional Gel Electrophoresis 
Laboratory of Geneva, Switzerland, is running a highly reproducible method for the two-dimensional 
• separation of proteins. The laboratory now provides a 2-D PAGE service to which you may send 
your samples for analysis. This service includes analytical and preparative high-resolution 2-D 
PAGE, electrotransfer on membranes and/or amino acid composition. 

May 17, 1995 

New link in the Tools page to the multiple sequence alignment at Washington University, 
May 11, 1995 

Two lints have been added to the SWISS-TROT entries. The first one directly submits a request to 
Swiss-Model for a 3D model of the current SWISS-PROT protein. Tne result is then sent back by 
e-mail. The second one allows to perform a sequence alignment with the current sequence, using 
NCBPs Basic Local Alignment Search Tool. This link is especially interesting in the virtual 
SWISS-PROT entries produced by the Translate tool. 

May 5, 1995 

■ We announce a new service, SWISS -FLASH, that reports news of databases, software and services 
developments from the Swiss biocomputing groups responsible for the ECD, ENZYME, LISTA, 
PROSn^ SeqAnalRefi SWISS-2DPAGE, SWISS-3DMAGE and SWISS-PROT databases; the 
Melanie software package; the WWW ExPASy server; the SWISS-Model, SWISS- Shop and other 
network-based computational tools; and the SWISS -2DSERVICE services. If you subscribe to this 
service, you will automatically get the SWISS-Flash bulletins by electronic mail. 

The SWISS-3DIMAGE database has been completely reorganised and indexed. The database is now 
searchable in the same way as the other SWISS-*** databases. We now also supply pictures in JPEG 
format, in addition to GIF and SGI. The images may still be downloaded by FTP. 

Links to REBASE points now the version maintained at John Hopkins, whose layout is nicer than 
our own text based version I * 

April 19, 1995 

We added Translate, a new tool which allows the translation of a nucleotide (DNA/RNA) sequence 
to a protein sequence. 

i 

Most of the pages in the server have been "refreshed" to make them more readable. 
March 21, 1995 

Links have been added from SWISS-PROT to the LISTA database of budding yeast (Saccharomyces 
cere vis iae) genes coding for proteins prepared under the supervisation of Patrick Linder. 

March 7,- 1995 # . 

Links have been added from SWISS-PROT to the HSSP database of structure-sequence alignments 
from the Protein Design Group, EMBL, Heidelberg. 

March2, 1995 

During the last two months, various links have been added: 
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• from SWISS-PROT to the SubtiList and YEPD databases 

' ' • from ENZYME to PROSITE and to the Iigand database in Kyoto 

• internally from PROSITE entries to other relevant PROSITE entries 

Links from SWISS-PROT to FlyBase use the new WWW server for that database. 
Many new SWISS-PROT documents have been added. 

the page on the" Melanie 2-D PAGE analysis software has been completely redesigned and includes 
now a on-line tutorial, as well as a request for information form. 

December 7, 1994 

In order to help users navigate through the ExPASy server, we have added graphical examples. More 
will be added in the future. See for example: . Celegans examples or the who's who on ExPASy page. 
Thanks to Brigitte Boeckmann for the illustrations. 

October 3 1 1994 

ENZYME : the ENZYME Data Bank has been added to the ExPASy server, This database may be 
accessed by EC number, name, compound, cofactor, comment, or by browsing through the list of 
classes, subclasses and sub-subclasses. Any entry in SWISS-PROT that contains an EC number in 
the DE line has also a direct link to ENZYME (by clicking on the EC number). 

October 20, 1994 

New services: . t 

• • Swiss-Shop - a sequence allerting system for Swiss-Prot that allows you to automatically 
obtain new sequence entries relevant to your field(s) of interest. 

• Swiss-Model - an automated knowledge-based protein modelling server. 

Compute pI/Mw: the tool to compute pi and Mw now accepts also a list of ID/ACs. 

SWISS-PROT: in PDB cross-reference lines, there is now a link called RASMOL, sending the PDB 
entry as a chemical /pdb MIME type. On Unix systems, if you add, in the file .mailcap in your home 
directory, a line of the form 

chemical/pdb; rasmol %s 

then RASMGL will automatically be launched to display the protein 3D structure-. This works also 
witiYany other program which accepts PDB coordinates. On systems other than Unix* this may also 
be specified. See your browser's manual. 

October 13, 1994 ] 

The SWISS-PROT top page has been re-modeled. A number of new functionalities and documents 

have been added. 

October 7, 1994 

New tools have been added: 

• Amino acid composition similarity search - the search may now also be performed from a 
given SWISS-PROT entry, whose amino acid composition will be comparedwith the whole 
SWISS/PROT database. / . - 

. Compute pDMw - Compute the theoretical pi and Mw from a SWISS-PRO? ED or AC, or for 
a giveri^equence. ' / 

Octobers, 1994 / 

The gels run daring the 2-D PAGE courses in Geneva are now displayed on the.server. 
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*. September 29, 1994 

SWISS-2DPAGE : protein maps now have a pI/Mw scale. , „ 

SeqAnalRef : the Sequence Analyst Bibliographic Reference database has been added to the BxPASy 
server. Trus database may be accessed by keyword, by reference identifier, by author and by full text 
search. 

List of on-line experts : in SWISS-PROT and PROS1XB top pages, a list of on-line expert ; gives you 
the possibility to directly send questions- to any of the listed experts. Hie list is ogranized by subjects. 

SWISS-PROT : new lists added: 

• List of abbreviations for journals cited 

• List of species has been made active 

• Yeast Chromosome IH entries in SWISS-PROT 

• Nomenclature of extracellular domain 
» List of on-line experts 

PROSUB : new 3D line with active links to PDB. 

September 26, 1994 

In the tool AACompIdent for identifying a protein by its amino acid composition, options nave been . 
added. They allow to specify how many proteins should be displayed, as well as the pi and Mw range 
in which the search should be performed. 

Also, some old bugs have now been corrected. 

September 12, 1994 * . . , , 

' The tool AACompIdent for identifying a protein by its amino acid composition, has been corrected 
and is now supposed to work. If you still encounter problems, please send us a mail. 

June 17, 1994 

SWISS-PROT: added cross-references (DR lines) to GenBank. 

June 16, 1994 ^ , _ . 

SWISS-PROT: added cross-references (DR lines) to MafteDB Maize Genome Database ot tne 

National Agricultural Library. , 

June <5, 1994 J . . c . . 

Added the PRQSHH page: PROSITE entries may now be searched by description ot sites ana 
pattern, by accession number, by author, and soon by full text search, 

June 3, 1994 

Added the GuessProt tool to the tools page: you may now get the SWISS-PROT proteins closest to a - 
given pi and Afiv. 

May 27, 1994 

In SWISS-PROT entries, added links to GCRDb - the G-Protein~Coupled Receptor DataBase . 



Added the list of nomenclature related references for proteins to the SWISS-PROT top page. 

/ 

May 26, 1994 

Added a new reference 2-D PAGE map ofPIatelet to SWISS-2DPAGE. 
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"» r ' * *' May 20, 1994 paot training in Geneva once everV three 

■ » «The SWISS-2DPAGBteam is now organizing a 2-D FAWi, training in ueneva jr 

months. 

^Ad^the Yeast Chromosome XI Bstofprpteins to the SWISS-PROT documentation page. 

^ -h asBLAST.BLnZ.PROSnE search 

and amino acid composition analysis, and more to come m the fixture. 

March SSe list of restriction enzymes and methylases in SWISS-PROT top page. 

^^'SU WWW server has been upgraded to a SPARCServer 10/51. It should perform much 
fester now. If some features are not working, please tell us about. 

^i'inL^OMIMarenowd^lidkstothe OMIM hypertext server from GDB. Thanks to Keith 
Robison for informing me about it. 

M " h ^ D PAGE:Ao<lede X perimental Ammo Acid Composition 

protein's amino acid composition and the server will e-mail you the hst of SWISS-PROT entries with 
similar compositions, sorted by decreasing similarity measure. 

^AdSLctlinktoNCBft BLAST Basic Lo^ Alignment Search Tool (ExPASy and 
SWISS-PROT top pages). 

^StaSwithreleaseaS, SWISS-PROT keyword search will be performed on the main release as 
well as on the weekly updates. 

In the SWISS-PROT page, added links to four additional active lists: • 

. Index of Escherichia coli K12 chromosomal entries in SWISS-PROT and their corresponding 

designations and WormPep cross-references 
•. SexrfIMctyostelramdi^.ideum entries in SWISS-PROT and their correspondmg gene 

designations and DictyDB cross-references . 

Feb To^SLw,tfer^ 

ErythroJIeukemia Cell (ELC). - ' 

In a SWIS§-2DPAGE entry/it is now possible to compute the theoretical plind Mw of the protein. 
February 14, 1994' 
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Added SWISS-2DPAGE Map Selection : you select a 2-D PAGE reference gel, click on a spot and 
get information on the corresponding protein. See the SWISS-2DPAGE top page . , t . , 

F * h ™7dlld a 9 net reference-2-D PAGE map of Cerebrospinal Fluid to SWISS-2DPAGB. 

January 28, 1994 

Added the bionet newsgroups . 

January 25, 1994 . 
Added an entry to SWISS-3DIMAGE images of crystallized proteins. 

In SWISS-PROT entries which contain cross-references to PDB, added a cross-reference to 
SWISS-3DIMAGE . Try for example AAT_ECO£J. 

January 24, 1994 . . 
^ Added full text search of the SWISS-PROT protein sequence database. 

January 17, 1994 

Added links to MEDLINE entries in SWISS-PROT, through NCBIs Entrez Server. 
Added, in the SWISS-2DPAGE page, a link to the QUEST Protein Database Center . 

December 1, 1993 ... 

Added a User Survey. Please help us inprove the server in participating to this survey. 

Added a new reference 2-D PAGE map of Lymphoma to SWISS-2DPAGE. 

November^3^1993 ^ ^ News ietter from November, 22, 1993, describing the 

World Wide Web. 

Novemb^8jm3^ ^ Genome Database at Columbia, Missouri and to EMBnet Switzerland . 

November 17, 1993 

Added the list of overall Top Ten users in the ExPASy server Activity Reports page. 

November 16, 1993 

Added Images of crystallized proteins ftom this server. 

Added links to Harvard Biological Laboratories , the Gene-Server at University of Houston, the 
EMBnet: Biocomputing in Europe, me biology servers index atUSGS, Jackson Laboratory 
WWW server and Keith Robison's Molecular Biology WWW sampler . 

October 12, 1993 . , „ - 

Added a list of specialised documents to the SWISS-PROT top page, such as 7-transmerribrane 
G-linked receptors, CD nomenclature for surface proteins of Human leucocytes and Vertebrate 
homeobox proteins. Some of these list give then direct access to corresponding SWISS-PROT 
entries. 

October 8, 1993 tvtd a rc \ 

Added links to the Caenorhabditis elegans and Mycobacterium databases at INRA prance;. 
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Added a link to the ^xPASy server activity reports. 



' October 4, 1993 

Moved to the NCSA server* 

September 28, 1993 

Added the PDB Brookhaven Protein Data Bank of 3D structures. In SWISS-PROT, cross-references 
to PDB have now active links to the gopher server at Protein Data Bank. You may access the PDB 
entry or get the 3D image. Try for example the SWISS-PROT entry P00782. 

September 27, 1993 

Added the ElyBase database of genetic and molecular data for Drosophila. In SWISS-PROT and 
EMBL, cross-references to FLYBASE are now active links. Therefore, SWISS-PROT has now 
active links to SWISS-2DPAGE, EMBL, PROSITE, REBASE, OMM and FLYBASE. EMBL has 
active links to SWISS-PROT and FlyPase. 

September 23, 1993 

Added a link to the National Institute of Health Genobase server to our top page. 
September 21, 1993 

Announced the ExPASy server and SWISS-2DPAGE release 0 io bionetarmounce. 
August 1, 1993 

Installed the ExPASy molecular' biology server, release 0, beta version. 



Last modified 21/OO/2004 by CHJB 

ih ExPASy Home page Site Map Search ExPASy Contact us 

Hosted by M CBR Canada Mirror sites: Australia Brazil Korea Switzerland Taiwan USA 
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Whafs New Archive 



PubMed 
NCBI 



Entrez 



BLAST 



OMIM 



SITE MAP 



► 1996 



Books 



TaxBrowser Structure 



What's New 
Archive 

2001 
2000 
'1999 ■ 
1998 
1997 
1996 
1995 
1994 

BACK 



11/25 Electronic PCR 



11/14 



11/04 



• Sequin, 
Release 1 .71 



dbGSS 
Announced 



Electronic PCR is now available. PCR-based 
sequence tagged sites (STSs) have been 
used as landmarks for construction of various 
1 types of genomic maps. Using "electronic 
PCR" (e-PCR), these sites can be "detected in 
DNA sequences, potentially allowing their map 
locations to be determined. 



A new release of Sequin , a sequence 
submission tool, is now available. Version 1.71 
features improved handling of phylogenetic 
sets of sequences and also allows users to 
update their'own pre-existing database 
records. 



The Database of Genome Survey Sequences 
(dbGSS) is now available. This database 
contains more detailed Information than the 
corresponding records in the GSS Division of 
GenBank. 



10/24 Human Gene The Gene Map of the Human Genome 

Map published in the October 25 issue of Science 

is available. This map shows the chromosome 
location of over 1 6,000 human genes with 
links to the underlying sequence and map 
data. 



10/04 Sequin Sequin , a stand-alone sequence submission 

tool, has a new release with several 
enhancements, including a repeat finder and 
ORF finder. New documentation and a tutorial 
are available, both on the Web and in NCBI's 
newsletter. 



09/27 ORF Finder 



The Open Reading Frame (ORF) Finder is a 
graphical analysis tool that finds all open 
reading frames in a user's sequence or one 
already in the database -of a selectable 
minimum size. 



09/06 Virological Software for analyzing animal trials and 

Software calculating infectious and 50% inhibitory doses 

is now available. The programs VacMan and 
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ID-50 can now be downloaded as 
self-extracting archives for either IBM or 
Macintosh computers. 



08/23 Complete The complete genome sequence and 

Genome, annotation of Methanococcus jannaschii, 

Methanococcus prepared by The Institute for Genomic 
jannaschii Research (TIGR) is now available in Entrez 

Genome , as well as in Gen Bank, where the 
1,7-megabase sequence has been separated 
into 150 records of approximately 1 1,000 bp 
each. The graphical view (as well as a link to 
underlying data) of the complete genome is 
present in Entrez Genome, along with the 
extrachromosomal elements 1 and 2. The 
complete sequence is also available by 
anonymous FTP ; see the README file for a 
description of the various files in the genomes 
division directory. 



08/20 Batch Entrez Downloading large numbers of sequence 

records from Entrez is now possible through 
'Batch Entrez' . User can specify a download 
for an entire set of records for a given 
organism or for a set of accession numbers. 
The data are saved to a file on the user's 
computer. . . 



08/05 Saccharomyces A new database has been added to the 
cerevisiae BLAST databases: all the nucleotide 

Database sequences from the yeast (Saccharomyces 

cerevisiae) genome sequencing project and 
their encoding amino acid sequences can now 
. be searched with the BLAST suite of 
programs. 



07/26 Cn3D in Entrez 



A major new release of Network Entrez is now 
available. Release 5.0 contains Cn3D . a new 
3D structure, viewer integrated into Network 
Entrez. 



07/1 5 BLAST2 The BLAST2 network service is now available 

on the FTP site without registration. Three 
clients for multiple platforms are available: 
btastcli has a convenient graphical interface 
and produces the "traditional" BLAST output; 
blastcl2 is a command-line client (meant 
mostly for UNIX) that also produces the 
traditional BLAST output; and PowerBlast 
produces a one-to-many alignment,, allows 
filtering by organism, and allows a gapped 
alignment as a post-processing of the BLAST 
results. Users of the older Experimental 
BLAST Network Service (with the exception of 
GCG users, who are still required to register 
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and use the older program) are encouraged to 
switch to this newest version . 



05/21; 
see 
.also . 
11/14 



Seauin Sequin is a program for submitting and 

updating GenBank entries. It Is designed to 
simplify the sequence submission process, 
provide graphical viewing and editing options, 
and allow submission of segmented entries. 
Sequin automatically adjusts feature table 
positions as the sequence is edited. Versions 
of Sequin are available through FTP for the 
Macintosh, PC/Windows, UNIX, and VMS. 



05/21 PowerBIast 



PowerBiast is a new network BLAST 
application for automated analysis of genomic 
sequences, it combines BLAST searching with 
filtering for low complexity regions and 
repeats. It can generate- organism-specifio 
output and compute optimal, gapped 
alignments. The results are displayed 
graphically and textually as multiple 
alignments, with annotated features 
superimposed on the aligned sequences. 
Versions of PowerBIast are available through 
FTP for the Macintosh, PC, SunOS, and 
Solaris. 



05/06 WWW BLAST 



The WWW BLAST page has been extensively 
revised: It now has both a simplified "Basic" 
Blast Search, allowing a user to search with 
the default parameters, as well as an 
"Advanced" page, where users may set 
BLAST parameters. An email option allows a 
user to receive results in a convenient form. 



04/10 WWW Entrez 



WWW Entrez now provides graphical views of 
nucleotide and protein sequences and access 
to the NCBI Genomes database, which 
contains graphical views of sequences and 
chromosome maps. Click on "Graphical view* 
from an Entrez document summary or click on 
the ,, Graphic ,, button from a sequence report. 



03/12 



Mouse/Human The Seldin/Debry Mouse/Human Homology 
Homology- Relationships page presents a table 

comparing genes in homologous segments ot 
DNA from human and mouse sources, sorted . 
by position in each genome. 



03/07 



Complete An NCBI research project, .Complete 

Genomes Genomes, presents the results of analyses of 

complete genome sequences. The analyses 
for the genomes of Haemophilus influenzae, 
E coli (75%), and Mycoplasma genitalium are 
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now available. 



03/08 BLAST 

Databases 



Changes to the BLAST Databases (February 
20 announcement superseded by that of 
March 8.) 



02/15 Homepage 

Reorganization 



Major reorganization of the NCBI homepage 
with new top-level links to additional databases 
and services. 



02/07 International 
Database 
Collaboration 



The International Nucleotide Seguence 
Database Collaboration page describes 
current projects and provides links to the sites. 



01/30 NCBI Structure 
Group 



The NCBI Structure Group (Steve Bryant) has 
a new page providing access to their structure 
research, the 'PKB and MMDB databases, and 
threading software. 



Revised: June 6, 2002. 



4 of 4 



141 



volume 1 2 Number 1 1 984 Nucleic Acids Research 



A comprehensive set of sequence analysis programs for the VAX 



John Devereux, Paul Haebcrli* and Oliver Smithies 



Laboratory of Genetics, University of Wisconsin, Madison, WI 53706, USA 



Received 18 August 1983 



ABSTRACT 

The University of Wisconsin Genetics Computer Group (UWGCG) has been 
organised to develop computational tools for the analysis and publication of 
biological sequence data. A group of programs that will interact with each 
other has been developed for the Digital Equipment Corporation VAX computer 
using the VMS operating system. The programs available and the conditions for 
transfer are described. 



INTRODUCTION 

The rapid advances in the field of molecular genetics and DNA sequencing 
have made it imperative for many laboratories to use computers to analyse and 
manage sequence data. UWGCG was founded when it became clear to several 
faculty members at the University of Wisconsin that the there was no set of 
sequence analysis programs that could be used together as a coherent system 
and be modified easily in response to new ideas* 

With intramural support a computer group was organized to build a strong 
foundation of software upon which future programs in molecular genetics could 
be based. This initial project has been completed and the resulting programs, 
written in Fortran 77, are available for VAX computers using the VMS operating 
system. Most of the programs eanbe used with only a terminal, although 
several require a Hewlett Packard plotter. 

UWGCG software has been installed for testing at eight different 
institutions. A simple method has been developed for transferring and 
maintaining this system on other VAX computers. 

DESIGN PRINCIPLES 

UWGCG program design is .based on the "software tools" approach of 
Kernighan and Plauger(l). Each program performs a simple function and is easy 
to use. The programs can be used independently in different combinations so 
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that complex problems are solved by the use of several programs in succession. 
Hew programming is simplified since less effort is required to bridge a gap 
between existing programs. 

UWGCG software is designed to be maintained and modified at sites other 
than the University of Wisconsin. The program manual is extensive and the 
source codes are organised to make modification convenient. Scientists using 
UWGCG software are encouraged to use existing programs as a framework for 
developing new ones. Our copyright can be removed from any program modified 
by more than 251 of our original effort. 

PROGRAMS AVAILABLE FROM UWGCG 

The programs described below are named and defined individually in Table 1. 

Program names in the text are underlined. 

Comparisons 

Comparisons may be done with "dot plots" using the method of Kaisel and 
Lenk(2). Optimal alignments can be generated by the methods of Needleman and 
Wunach(3), of SeUers(ft), and the "local homology" method of Smith and 
Vateraan(5). The Smith and Waterman alignment algorithm is also the most 
sensitive method available for identifying similarities between weakly related 
sequences* 

Mapping and Searching 

Mapping is available in aeveral formats. Graphic maps display all of the 
cuts for each restriction enzyme on parallel lines. This graphic map 
facilitates selection of enzymes for isolating any region of a sequenced UNA 
molecule. Sorted maps in tabular format arrange the fragments from any 
digestion in order of molecular weight to show which fragments are similar in 
8 ire and thus likely to be confused in gels. Another frequently used mapping 
format, designed by Frederick Blatcner<6), displays the enzyme cuts above the 
original DNA sequence. Both strands of the DHA and all six frames of 
translation are shown. 

All mapping programs will search for user- specified sequences, allowing 
features to be marked at the appropriate position on a restriction map. The 
mapping and searching programs can be used to aid site- specific mutagenesis 
experiments by showing where mutations could generate new restriction sites. 
Ail of the positions in a sequence where a synthetic probe could pair with one 
or more mismatches can also be located. Sequences related to less precisely 
defined features such as promoters or intervening sequence splice sites, can 
be located with a program that uses a consensus sequence as a probe. The 
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Programs Available from UWGCG 



Function 



Gap 

BestFit 

MapPlot+ 

MapSort 

Map 

Consensus 
FitConsensus 

Find 

Stemloop 
Fold* 



CodonPreference* 

CodonFrequency 
Correspond 

Test-Code* 

Frame 4, 

Plots tatis tics'*- 
Composition 
Repeat 
Fingerprint 

Seqed 

Assemble 

Shuffle 

Reverse 

Reformat 

Translate 

BackTranslate 

Spew 

GetSeq 

Crypt 

Simplify 



Publish 
Poster* 
OverPrint 



makes a dot plot by method of Maize 1 and Lenk(2) 

finds optimal alignment by method of Needleman and WunschO) 

finds optimal alignment by method of Smith and Waterman(5) 

shows restriction map for each enzyme graphically 

tabulates maps sorted by fragment position and size 

displays restriction sites and protein translations above 

and below the original sequence(Biattner ,6) 

creates a consensus table from pre- aligned sequences 

finds. sequences similar to a consensus sequence using a 

consensus table as a probe 

finds sites specified interactively 

finds all possible stems (inverted repeats) and loops 
finds an UNA secondary structure of minimum free, energy 
hy the method of ZukerO) 

plots the similarity between the codon choices in each 
reading frame and a codon frequency table(8) 
tabulates codon' frequencies 

finds similar patterns of codon choice by comparing 
codon frequency tables (Grantham et al,9) 
finds possible coding regions by plotting 
the "TestCode" statistic of Fickett(lO) 
plots rare codons and open reading frames (8) 
plots as ytnme tries of composition for one strand 
measures composition, di and trinucleotide frequencies 
find's repeats (direct, not inverted) 

shovs the labelled fragments expected for an BNA fingerprint 

screen oriented sequence editor for entering, editing 

and checking sequences 
. joins sequences together 

randomises a sequence maintaining composition 

reverses and /or complements a sequence 

converts' a sequence file from one format to another 
' translates a nucleotide into a peptide sequence 

translates a peptide into a nucleotide sequence 

sends a sequence to another computer 

accepts a sequence from another computer 

encrypts a file for access only by password 

substitutes one of six chemically similar amino acid 

families for each residue in a peptide sequence 

arranges sequences for publication 

plots text (for labelling figures and posters) 

prints darkened text for figures with a daisy wheel printer 



+ requires a Hewlett Packard Series 7221 terminal plotter 
* Fold is distributed by Dr. Michael Zuker not UWGCG. 
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mapping programs can also be used on protein sequences to identify the 
peptides resulting from proteolytic cleavage. 
Secondary Structure 

Three programs are available to examine secondary structure in nucleic 
acids. The program StemLoop identifies all inverted repeats. An 
implementation of Dr. Michael Zuker's Fold program(7) finds an RNA secondary 
structure of minimum free energy based on published values of stacking and 
loop destabilizing energies. The "dot plot" comparison (mentioned above) of a 
sequence compared to its opposite strand gives a graphic picture of the 
pattern of inverted repeats in a sequence. 
Analysis of Composition and the Location of Genetic Domains 

Regions of a sequence with non-random base distribution can be displayed 
with three graphic tools designed to identify genetic domains. The program 
CodonPreference (B) identifies potential coding regions by searching through 
each reading frame for a pattern of preferred codon choices. The 
CodonPreference plot predicts the level of translational expression of mRHAs 
and helps identify frame shifts in DNA sequence data. Patterns of codon 
choice can be compared with the program Correspond/ 9). When a strong pattern 
of codon prefer endes is not expected, the "TestCode" statistic of Fickett(lO) 
can be plotted to show regions of compositional constraint at every third 
base. Another program plots asymmetries of composition by strand. Strand 
asymmetries have been associated with genetic domains by several 
authors (li)( 12). A fourth program called Frame marks the positions of rare 
co dons and open reading frames on a graph showing all six reading frames. 

Several tools are available to measure content and to count dinucleotide, 
trinucleotide, neighbor and repeat frequencies. A program that predicts RNA 
fingerprint patterns and" another that tabulates codon frequencies complete the 
group of programs that analyse composition* 
Sequence Manipulation 

Sequences may be entered, assembled, edited, reversed, randomised, 
reformatted, translated, back- translated, documented, transferred, or 
encrypted rapidly with a targe set of sequence manipulation tools. 

A screen-oriented editor is available that allows sequences to be entered 
and checked. After a sequence is entered, it may be reentered for 
proofreading. Whenever a reentered base ia at variance with the original, the 
terminal bell rings and the position is marked. Existing sequences can be 
edited quickly by moving directly to a sequence position specified by either a 
coordinate or a sequence pattern. The program can reassign the terminals 
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keys to place G, A, T and C conveniently under the fingers of one hand in the 
same order as the lanes of a sequencing gel. 

Programs are available for changing sequence file format. Sequence data 
from any source can be used in UWGCG programs! and sequence files maintained 
with UWGCG software can be converted for use in other non-UWGCG programs. For 
instance, the programs of Roger Staden(l3) or Intelligenetics Inc. (14) could 
be used to assemble a sequence from the sequences of many small sub- fragments 
generated by DNAase I digestion. The assembled sequence could then be 
reformatted for use in any UWGCG program. A. program is available that 
transfers sequences to and from other computers. 
Sequence Publication 

A program, Publish , vill format sequences into figures. Publish has 
alternatives for line size, numbering, scaling, translation and comparison to 
other sequences. Poster is a program that will plot text on figures. 

GENERAL FEATURES OP DWGCC SOFTWARE 
Interactive Style 

Each program is run by simply typing its name. Every parameter required 
by the program is obtained interactively. Questions are answered with a file 
name, a yes, a no, a number, or a letter from a menu. Default answers are 
displayed. Programs are insensitive to absurd answers and will ask the 
question again if, for instance, you name a file that does not exist or if you 
use a nonnumeric character when typing a number. Special features such as 
plotting features oriented to publication, are obtained by using an extra word 
next to the program's name when the program ia run. Thus parameter queries 
are kept to a minimum for the normal use of each program. 
Data 

Both the NIR-GeaBank(15) and the EHBL(16) nucleotide sequence data 
libraries are available "on-line" to any UWGCG program. A Search utility will 
locate sequences in the libraries by key word. A Find utility will locate 
library entries containing any specified sequence. A program is available 
that installs the new data sent periodically from GeaBank and EMBL to update 
their data libraries. 

All of the data in the system are stored in text files that can be read 
and modified easily. Every data file has an English heading describing the 
contents. The data files may be copied by each ttser for analysis or 
modification. Programs recognize and read user-modified input data 
automatically. Data files can be modified with any text editor. 
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Sequence File Structure 

Sequences are maintained in files that allow documentation and numbering 
both above and within the sequence. This file format is compatible with both 
of the nucleic acid sequence libraries and has been adopted as the standard 
sequence file format by the data base project at the European Molecular 
Biology Lab. Because genetic manipulations commonly involve linking several 
molecules of known sequence, UWGCG sequence files are designed to support 
concatenation by allowing comments to appear within the sequences at any 
location. Coding sequences or the boundaries between cloning vector and 
insert , for instance, can be marked within the sequence itself for immediate 
identification. 
■ Sequence Symbols 

All possible nucleotide ambiguities and all standard one- letter amino 
acid codes are part of the UWGCC symbol set that includes all alphabetic 
characters plus five additional characters. The proposed IUB-IUPAC standard 
nucleotide ambiguity symbols (17) are used for the mapping, searching and 
comparison programs. Lower case characters are used in sequences to indicate 
uncertainty as distinct from ambiguity. This allows the entire lexicon of 
symbols to be reused with same meaning, but with the prefix "maybe-. 11 This 
reuse of the symbol set in lower case makes the uncertainty symbols more 
complete, understandable and visible. 
Symbol Comparison 

Sequence analysis programs generally make comparisons between sequence 
symbols (bases or amino acids) in order to find enzyme sites, create 
alignments, locate inverted repeats etc. These symbol comparisons are handled 
in several ways. 

Symbol comparisons for alignment, comparison and secondary structure 
analysis are made by looking up a value in a symbol comparison table for the 
quality of the match. The table might contain l's for matches and O's for 
mismatches. If amino acids are being compared, however, a real number could 
be assigned at each position based on some previously assigned chemical 
similarity of the pair of residues or on the mutational distance between their 
codona. Standard symbol tables are provided by UWGCG, but the system is 
designed to allow each user to specify his own values. 

Symbols comparisons for mapping and searching operations in nucleic acids 
are made by converting the IUB-IUPAC symbols into a binary code. The bits of 
this code represent C, A, T and C with ambiguity symbols causing more than one 
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bit to be set. A group of library functions identify overlap between the bits 

for each IUB-IUPAC symbol. 

Documentation 

Documentation is available both in printed form and on the terminal 
screen. A 350 page manual describes the operation of each program in detail, 
gives practical considerations and shows what will appear on the screen during 
a session with the program. Output files and plots are shown for the session. 
The data for the session shown in the documentation are included with the 
system so that the each program's operation can be checked. The "on-line" 
documentation is the same as the manual, but can be changed immediately when a 
program is modified. 

All programs write output to files that are completely documented and 
sensibly organised for input to other programs. The input data, the program 
and the parameters used are. clearly identified in every output file. 
Procedure Library 

UtfGCG programs are written largely as calls to a library of 250 
procedures designed to manipulate biological sequences. These procedures use 
data and file structures which have been designed to simplify program 
modification. For instance, standard operations such as reading sequences 
from files are always handled by a single library procedure. Thus a change in 
sequence file format requires only one subroutine to be modified for the new 
format to be acceptable to all of the programs in the system* Command 
procedures . are available to help modify the library. The procedure library 
can be used by programs written in any language. 

DISTRIBUTION OF PWGCG SOFTWARE 
Intent 

The intent of UWGCG is to make its software available at the lowest 
possible cost to as many scientists as possible. 
Fees 

A fee of $2,000 for non-profit institutions or $4,000 for industries is 
being charged for a tape and documentation for each computer on which UWGCG 
software is installed. While no continuing fee is required, UWGCG software, 
like the field it supports, is changing very rapidly. A consortium of 
industries and academic laboratories is planned to support the project in the 
future. The consortium will entitle its members to periodic updates and to 
influence the direction of new programming undertaken by UWGCG in return for a 
pledge of continuing financial support* 
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Copyright a 

UVGCG retains the copyrights to all of its software and UWGCG must he 
contacted before all or any part of the its software package is copied or 
transferred to any machine. UWGCG is, however, mandated to provide research 
tools to help scientists working in the area of molecular genetics and we are 
glad to see our source codes become the basis of further programming efforts 
by other scientists. Copyright can be removed for any program modified by 
more than 252 of its original effort. 
Tape Format' 

The UWGCG package is usually distributed in VAX/ VMS "backup" format on a 

9 track magnetic tape recorded at 1600 bits/inch. The system consists of 
about 1000 files using about 20,000 blocks at 512 bytes/block. The current 
versions of the GenBank and EMBL nucleotide sequence data bases are normally 
included which add another 3,000 files and require another 20,000 blocks. 

Upon request UWGCG will make a card image tape of all of the Fortran 77 
programs and procedures for reading on computers other than the VAX. The card 
image tape is usually provided at 1600 bits/inch with 80 characters/record and 

10 records/black* Adaptation of UWGCG software to systems other than VAX/ VMS 
may take considerable effort. 

Equipment Required 

UWGCG programs and command procedures will run on a Digital Equipment 
Corporation (DEC) VAX computer that is using version 3.0 or greater of the DEC 
VMS operating system. A tape drive is necessary; a floating point accelerator 
and a DEC Fortran compiler are helpful, but not requited. All programs can be 
run from a DEC VT52 or VT100 terminal. Seven programs, as noted in table 1, 
require a Hewlett Packard 7221 terminal plotter wired in series with the 
terminal. Several utilities support a daisy wheel compatible printer attached 
to the terminal's pass-through port, however, all programs write output files 
suitable for printing on any standard device. 
Inquiries 

Inquiries may be sent to John Devereus at the Laboratory of Genetics, 
University of Wisconsin', Had is on, WI, USA 53706, (608) 263-8970. UWGCG is not 
licensed to distribute Fold(7), but the UWGCG implementation is available from 
Michael Zuker, Division of Biological Sciences, National Research Council of 
Canada, 100 Sussex Drive, Ottawa, Canada, KiA 0R6 (613) 992-4182. 
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ABSTRACT 

S'-Noncoding sequences have been tabulated for 211 messenger RNAs from higher 
eukaryotic cells. The 5* -proximal AUG triplet serves as the initiator codon 
in 95% of the mRNAs examined. The most conspicuous conserved feature is the 
presence of a purine (most often A) three nucleotides upstream from the AUG ini- 
tiator codon; only 6 of the mRNAs in the survey have a pyrimidine in that posi- 
tion. There is a predominance of C in positions -1, -2, -4 and -5, just upstream 
from the initiator codon. The sequence CC^CCATJG(G) thus emerges as a consensus 
sequence for eukaryotic initiation sites. The extent to which the ribosome bind- 
ing site in a given mRNA matches the -1 to -5 consensus sequence varies: more 
than half of the mRNAs in the tabulation have 3 or 4 nucleotides in common with 
the CCACC consensus, but only ten mRNAs conform perfectly. 



INTRODUCTION 

Two years ago I prepared a compilation of the then-available 5*-noncoding 
sequences of eukaryotic mRNAs (Curr. Topics Microbiol , Immunol. 93, 81-123, 1981). 
Apart from a bevy of globin and histone mRNAs, only 32 other cellular mRNA se- 
quences were known at that time. In contrast, there are 166 cellular mRNAs in 
the present compilation, not counting the globins and histones. I have excluded 
the mRNAs of lower eukaryotes and viruses, only to keep the survey manageable. 
One of ray objectives was to determine whether certain patterns noted in the earlier 
compilation would be evident with this larger, more diversified set of sequences. 

A few points about the selection and presentation of the sequences require 
explanation. In cases where numerous representatives of a gene family have been 
sequenced, I have omitted many and chosen those in which the leader sequences 
show the most divergence. There are, exceptions , however. It seemed useful to 
include certain pairs of mRNAs in which the leader sequences Bhow extensive homol- 
ogy except near the AUG initiator codon (e.g. human versus rat prepro insulin) . 
The opposite pattern is also provocative; i.e., sequence conservation only near 
the AUG codon, as in human versus rat immunoglobulin E. upon inspecting the 
completed compilation, only two families of mRNAs appeared to be excessively 
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represented: his tones and globins. The S'-noncoding sequences of histone 
mRNAs are sufficiently varied that they pose little danger of distorting the 
search for homology among ri bo some binding sites. This is less true of the 
globin sequences, and I have controlled for this as described later in the text. 

Nucleotide sequences determined by analyzing genomic DMA have been included 
only when there is sufficient supplementary data to identify introns that lie 
upstream from the AUG codon (or to verify their absence) and to estimate the 
location of the 5' -end of the mRNA. The 5* -end of each mRNA in the table was 
identified according to one of the following criteria: 

(a) Direct sequence analysis of the purified mRNA. 

(b) Primer- extension and/or mapping with a single-strand specific nuclease, such 
as SI. With these techniques there is often a 2- to 4-nucleotide ambiguity 
in pinpointing the cap site. 

(c) Termination of the longest cDNA clone. When the cDNA clone is known to stop 
significantly short of the 5' -end of the mRNA, the sequence in the table is 
preceded by an ellipsis (...). 

(d) Sequence homology with the corresponding gene from a closely-related species 
in which the 5* -terminus of the mRNA has been mapped. 

(e) Presence in the genomic DNA sequence of an appropriately-positioned TATA box, 
25- to 30-nucleotides upstream from the presumptive cap site. In the absence 
of other supporting data this criterion is rather weak. 

The following criteria, identified by code letters in the rightmost column 

of the table, were used to identify the AUG initiator codon in each message: 

(a) The nucleotide sequence corresponds to the known N- terminal amino acid se- 
quence of the primary translation product. In some cases amino acid and nu- 
cleotide sequence data were derived from two different but related organisms. 

(b) The N- terminal amino acid sequence has been determined only for the mature 
protein, which is known (or presumed) to derive from a precursor that carries 
an N- terminal extension (the "signal peptide") of 15 to 30 amino acids. The 
indicated AUG triplet is the only candidate initiation site compatible with 
the synthesis of such a precursor. 

(c) The nucleotide sequence has a single open reading frame which either corres- 
ponds in size to the known molecular weight of the encoded protein, or in- 
cludes peptides that are known to be present in the mature protein. 

(d) The indicated AUG triplet occurs at the beginning of the longest open reading 
frame, but the exact size of the primary translation product is not- available 
for comparison. This criterion is rather weak. 

(e) The initiation site was deduced from sequence homology with the corresponding 
gene from a closely-related species in which the start site has been defined. 

(f) Under conditions that allow formation of initiation complexes in vitro, the 
indicated AUG triplet was protected by ribo somes against nuclease digestion. 

In 13 of the mRNAs in the table the functional initiator codon has not been defin- 
itively identified) the structure of the encoded protein is compatible with initia- 
tion at either of two nearby AUG triplets* In such cases I have predicted which 
AUG is most likely to be the (major) initiation site. Those entries are marked ■ 
with an asterisk in the rightmost column. The AUG initiator codon was predicted 
based on position (i.e., proximity to the 5' -end of the mRNA) and conformity to 
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Figure 1. Length distribution of the 5 1 -noncoding portion of eukaryotic 
mRNAs. To avoid weighting the distribution by the large number of globin 
raRNAs in the sequence table, I scored the globin mRNAs only once in this 
tally. 

the consensus sequence CCACCAUG . in a later section of the text I will explain 
in greater detail how this was done. 



DISCUSSION 

A few generalizations emerge from inspection of the sequences tabulated herein. 

(i) The length of the 5 '-noncoding region varies widely— from 3 to 572 
nucleotides. However, 70% of the leader sequences are clustered in the 20- to 
80-nucleotide range, as shown in Figure 1. The unusually long leader sequences 
occur on unusually interesting mRKAs (epidermal growth factor, oncogenes, heat 
shock proteins), inviting speculation that the structure of the 5' -noncoding 
region participates in the regulated expression of those genes. 

(ii) Translation begins at the 5* -proximal AUG triplet in 95% of the mRNAs 
tabulated herein. There are only ten raRNAs listed in which one or more AUG tri- 
plets occur upstream from the recognized initiation site. The number of "non- 
functional" upstream AUG codons in each of those messages is shown in parentheses 
at the right edge of the table . [The upstream AUG ' s are called "nonfunctional" because 
there is as yet no evidence that ribo somes recognize those sites, but theory pre- 
dicts that ribo somes should initiate (inefficiently) at the upstream AUG triplets 
as well as at the AUG codon that heads the long open reading frame.] The number 
of mRNAs with upstream AUG triplets would increase to 15 if my predictions are 
correct about which AUG is the major site of initiation in entries 80, 141, 146, 
179 and 205. I have dealt elsewhere with the question of how ribosomes get past 
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Figure 2. Frequency distribution of 
each nucleotide around the functional 
initiator codon in 198 mRNAs listed 
in the table. The calculations pre- 
sented here do not include the 13 mRNAs 
in which the trans la tional start site 
was predicted but has not been verified. 
The nucleotide immediately preceding 
the AUG codon is numbered -1; nucleo- 
tides +4 to +6 represent the start of 
the protein coding sequence. The dot- 
ted line across each panel indicates 
the 25% value that would be expected 
on a random basis. To ensure that the 
results were not distorted by the inclu- 
sion of too many closely-related globin 
mRNA sequences, I recalculated the fre- 
quency of occurrence of each nucleotide 
in positions -1 through -8, omitting all 
of the globin sequences. Although the 
absolute values changed somewhat {e.g. 
G in position -6 dropped from 40% to 
36%) , the rank order of nucleotides in 
each position remained unchanged. 

the upstream AUG triplet(s) in such messages (Kozak, Microbiol. Rev., 47, 1-45, 
1983; Kozak, manuscript submitted). The main point to note here is that such 
mRNAs are rare. The "first-AUG-rule" holds for 93% to 95% of the entries in the 
table. 

(iii) The sequences in the table have been searched manually for signs of 
a conserved motif that might uniquely identify AUG initiator codons. The most 
conspicuous conserved feature is presence of a purine (most often A) in position 
-3; i.e., three nucleotides upstream from the initiator codon. As illustrated 
in Figure 2, 79% of the .mRNAs that were counted have A in that position, 18% have 
G, and only 3% (a total of 6 messages) have a pyriraidine in position -3. The 
strong preference for a purine in position -3 is peculiar to AUG triplets that 
serve as initiator codons. Pyrimidines are favored in the -3 position preceding 
AUG triplets that lie upstream from the initiation site, in those rare mRNAs that 
have upstream AUGs (Kozak, Nuc. Acids Res. 9, 5233-5252, 1981); and the nucleo- 
tide frequency in position -3 is almost perfectly random around AUG triplets that 
code for methionine at internal positions in polypeptide chains (Kozak, 1983, op. 
cit.). Although no other position is as highly conserved as the purine in posi- 
tion -3, the distribution of nucleotides is decidedly nonrandom in every position 
from -1 through -6, and perhaps beyond. The predominance of C in positions -1, 
-2, -4 and -5 was evident in an earlier survey (Kozak, 1981, op cit.) and is con- 
firmed here. The preference for G in position +4, noted in the previous survey, 
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is less evident here. From the data in Figure 2, the sequence CC^CCAUG(G) emerges 
as a consensus sequence for. eukaryotic initiation sites. The extent to which a 
given message matches the -1 to -5 consensus sequence varies considerably: only 
10 mRNAs in the table conform perfectly to the CCACC sequence j in more than half 
of the mRNAs, 3 or 4 of the nucleotides directly preceding the AUG codon match 
the consensus sequence; about 10% of the mRNAs have a purine in position -3 but 
otherwise differ entirely from the -1 to -5 consensus. The 6 mRNAs in the tabu- 
lation that lack a purine three nucleotides upstream from the initiator codon do 
not seem to compensate by conforming closely to the other four consensus positions. 
Recent site-directed mutagenesis experiments (Kozak, manuscript submitted) have 
confirmed the importance of the purine in position -3, but there is as yet no evi- 
dence that cytosine in positions -1, -2, -4 and -5 contributes to recognition of 
eukaryotic initiation sites. 

Obviously the (semi) conserved sequence revealed by a survey such as this 
need not correspond to the most favorable context for initiation, since the table 
includes mRNAs that vary in translational efficiency. Nonetheless, reference to 
the consensus sequence, especially the highly conserved -3 position, can be of 
help when searching a new mRNA sequence to locate the translational initiation 
site. It is important to avoid two errors when using this approach: 

(a) If inspection of the sequence near the 5' -end of the mRNA were to reveal two 
AUG triplets that conform approximately equally to the consensus sequence, it 
would be incorrect to conclude that either AUG is equally likely to be the ini- 
tiator codon. Because 40S ribosomal subunits most likely scan the 5* -end of the 
mRNA in a linear fashion (Kozak, Cell 34, 971-978, 1983), the 5* -proximal AVG 
triplet is the first to be "inspected," If the sequence preceding the first AUG 
triplet conforms closely to the consensus, especially if an A occurs in position 
-3, the search ends there. There are two exceptions to this rule. The first 
involves a small number of mRNAs in which the reading frame following the first 

sequence is short, terminating upstream from a second AUG codon to which 
ribosomes seem to gain access by reinitiating'. The second exception consists 
of a single example: the mRNA derived from influenza B virus genome segment 6 
allows ribosomes to initiate efficiently at the first and the second AUG codons, 
although the first AUG triplet occurs in a "good" context (ANNAUGA) and is not 
followed by a terminator codon (Shaw et al. , Proc. Natl. Acad. Sci. USA 80, 4879- 
4883, 1983). I have no explanation for this at present. 

(b) An AUG triplet that deviates from the consensus in the crucial -3 position can 
nevertheless serve as the initiator codon. This is evidenced by a few mRNAs in 
the table (entries 40, 98, 129, 133, 134, 196) and also by experimental manipu- 
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lation of the sequence flanking the initiator codon (Sherman et al., Cell 20, 215- 
222, 1980; Kozak, manuscript submitted). As a consequence of initiating at a 
"weak" AUG codon, however, those rare messenger RKAs are predicted to 
have two special properties: translation should be inefficient; and ri bo somes 
should initiate not only at the first (weak) AUG but also at the next AUG that 
lies downstream. Such mRNAs should therefore have the potential to direct syn- 
thesis of two proteins. This has been shown to occur with a few viral mRNAs 
(Kozak, 1963, op.cit.) but it has yet to be demonstrated for cellular mRNAs. 

The -1 to -5 consensus sequence detected in this survey differs from previ- 
ously-suggested eukaryotic consensus sequences (Hagenbuchle et al., Cell 13, 551- 
563, 1978; Baralle and Brownlee, Nature 274, 84-87, 1978; Stiles et al., Cell 
25, 277-284, 1981) in both its high frequency of occurrence and its constant posi- 
tion relative to the AUG initiator codon. None of the previously- suggested con- 
sensus sequences met those criteria. Until further experiments are carried out, 
it is premature to speculate about the mechanism by which flanking nucleotides 
might modulate recognition of the AUG initiator codon; but the temptation is ir- 
resistable. Sargan et al. (FEBS Lett., 147, 133-136, 1982) have noted an intrigu- 
ing complementarity between the sequence CCAOC in mRNA and the sequence GGUGG at 
the base of the 3' -terminal hairpin structure in 18S ribosomal RNA. The possi- 
bility of base pairing between mRNA and rRNA thus seems worth exploring. An al- 
ternative rationalization for the conserved sequence preceding the initiator codon 
is that it might base-pair with a complementary sequence just downstream from the 
AUG codon. The resulting hairpin could help to identify the initiation site. Al- 
though some mRNAs (see entries 18, 66, 151) have the potential to form a stable 
^iairpin structure centered about the AUG codon, this is by no means universal. 
Moreover, comparison of closely-related sequences does not reveal compensatory 
changes that would preserve the potential hairpin structure. 

Bibliography . The numbers used here to identify each mRNA correspond to those 
in column 1 of the table. Bibliographic data are presented in condensed formt 
first author, year, journal title, volume, first page. Personal communications 
are indicated by the letters pc after an individuals name. My thanks are extend- 
ed to those individuals. 
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ABSTRACT 

Sequences flanking the initiator codon in eukaryotic mRNAs are not random. 
Out of 153 messages examined, 151 have either a purine, in position -3, or a G 
in position +4, or both. Thus, §XXAUGG emerges as the favored sequence for 
eukaryotic initiation sites. Nucleotides flanking nonfunctional AUG triplets, 
which occur in the S'-noncoding region of a few eukaryotic messages, are dif- 
ferent from those found at most functional sites. Whereas most authentic ini- 
tiator codons are preceded by a purine (usually A) in position -3, most non- 
functional AUGs have a pyrimidine in that position. The observed asymmetry 
suggests that purines in positions; -3 and +4 might facilitate recognition of 
the AUG codon during formation of initiation complexes. To test this idea, 
in vitro binding studies were carried out with 3 *P-labeled oligonucleotides. 
Binding of AUG-containing oligonucleotides to wheat gem ribosomes was signif- 
icantly enhanced by placing a purine in position -3 or +4. The scanning model, 
which postulates that 40S ribosomal subunits attach at the 5' -end of a message 
and migrate down to the AUG codon, is discussed in light of these new observa- 
tions. A modified version of the scanning mechanism is proposed. 

INTRODUCTION 

The pivotal role of the AUG codon in defining the start site for protein 
biosynthesis has long been recognized. Only a small fraction of the AUG trip- 
lets in a given message function as ribosome binding sites, however. In pro- 
karyotic messenger RNAs, the major ancillary signal that dictates which AUG 
codons will be selected by ribosomes is a purine-rich sequence centered about 
ten nucleotides upstream from the AUG triplet (1) . Other sequences located far- 
ther to the left (2-4) or right (5,6) of the AUG codon have also been shown to 
influence translational efficiency i at least in some messages. In prokaryotes, 
the role of the purine-rich sequence preceding the initiator codon was deduced 
from comparison of nucleotide sequences among a large number of messages, and 
from manipulation of messenger RNAs; i.e., altering the "Shine/Da lgar no" region 
preceding a particular initiator codon lowered or abolished binding of ribo- 
somes to that site (7,8). By contrast, eukaryotic ribosomes appear to be more 
tolerant of sequence changes within the region of a message preceding the ini- 
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tiator codon (9-12). Comparison of the 5' -proximal nucleotide sequences of 
eukaryotic messages reveals remarkable heterogeneity, even among some closely 
related mRNAs (13-15) , reinforcing the impression that eukaryotic ribo somes 
might not require a particular sequence to demarcate the initiation site. Based 
in part on observations such as these, I have hypothesized that eukaryotic 
ribo somes bind initially at the 5 '-end of a message and then migrate, scanning 
the mRNA sequence until they encounter the first AUG triplet which, solely by 
virtue of its position, is the initiator codon. Considerable evidence has been 
adduced in favor of this "scanning mechanism" (16,17). A compilation in Dec, 
19B0, revealed that in 90 out of 99 eukaryotic messages which had been sequen- 
ced, translation does indeed begin at the AUG triplet which is closest to the 
5' -terminus (17). Although it is gratifying that 90% of the messages examined 
conform to the prediction, the 9 exceptional messages (a number which has since 
grown to 11) in which translation does not start at the first AUG have been 
puzzling. The scanning model, in its simplest form, states that the initiator 
codon is recognized by its position (i.e., first-in-line) irrespective of the 
flanking sequences. But as sequence information has become available from more 
and more eukaryotic messages, it has become obvious that the nucleotides flank- 
ing the initiator codon are not random. The data compiled in this paper show 

that nucleotides in positions -3 and +4 are highly conserved. (The numbering 
-J +1 +4 

system used here is XpXpXpApUpGpX. ) To determine whether the conserved nucleo- 
tides play a role during initiation, I have constructed a series of AUG- 
containing oligonucleotides and measured their ability to bind to wheat germ 
ribosomes in vitro. The binding efficiency of the oligonucleotides was signif- 
icantly enhanced by placing a purine in position -3 or +4. A slightly more 
elaborate version of the scanning model, which takes these new data into ac- 
count, may provide an explanation for those exceptional eukaryotic messages in 
which initiation is not restricted to the 5 1 -proximal AUG codon. 

MATERIALS AND METHODS 

Synthesis and characterization of oligonucleotides 

To simplify the representation of nucleotide sequences, the designation p 
is used for [ 32 P); Y = pyrimidine» R = purine; X « any one of the four common 
ribonucleotides. All of the di- and trinucleotides used in this work were pur- 
chased from P-L Biochemicals except for GpUpG, which was from Boehringer Mann- 
heim, and CpCpC, which was from Collaborative Research. 
Stepwise synthesis of oligonucleotides varying in position -3 

The oligonucleotide ApUpGpCp was first synthesized by ligation of [5*- 32 P) 
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pCp to ApUpG. The donor [5 , - 32 P)pCp was prepared in a reaction containing 40 mM 
TriS'HCl (pH 8.5), 10 mM dithiothreltol, 10 mM MgCl 2 , 2 mM spermine, 1 pg 
bovine serum albumin, 0.2 mM 3' -CMP, 300 pCi of -ATP (New England Nuclear, 

specific activity adjusted to 300 Ci/mmol) , and 3 O of polynucleotide kinase 
(P-L Biochemicals) . After incubation at 37°C for 40 min, the reaction was ter- 
minated by boiling for 2.5 min. Following inactivation of the polynucleotide 
kinase, the solution containing (5'- 32 P]pCp was used immediately in a reaction 
with RNA ligase. The 40 pi reaction mixture contained 20 pi of (boiled) kinase 
reaction, 0.6 OD 260 units of ApUpG triplet, 15 U of RNA ligase {from T4 phage- 
infected E. colij P-L Biochemicals product 0880), 0.13 mM ATP, 10% dimethylsul- 
f oxide (Mallinckrodt) , additional MgCl 2 to give a final concentration of 20 mM, 
and additional Tris.HCl to give a final concentration of 50 mM. Incubation was 
carried out at 4°C for 18-20 hr, as recommended by Bruce and Uhlenbeck (18). 
Unless otherwise indicated in the text, these reaction conditions permitted 
quantitative ligation of the 32 P-labeled donor to the acceptor oligonucleotide. 
The product of the ligase reaction was purified by phenol extraction followed 
by electrophoresis on Whatman 3 MM paper in pyridine/acetate buffer at pH 3.5. 
ApUpGpCp migrates slightly faster than the xylene cyanol marker. The 32 p- 
labeled oligonucleotide was eluted from the paper with water, further purified 
by chromatography on Bio-Gel P-2, then stored in water at -70°C. 

Sequential kinase/ligase reactions were carried out a second time to ob- 
tain XpCpCpApOpGpCp, where X « C, A or G. The tetranucleotide ApUpGpCp from 
the preceding step was first phosphorylated by incubation with polynucleotide 
kinase and 0.2 mM (nonradioactive) ATP; the resulting pApUpGpCp was used as 
donor in a reaction with RNA ligase. Three ligase reactions were carried out, 

using as acceptor either CpCpC, ApCpC or GpCpCj reaction conditions were as 

32 

described in the preceding paragraph. After phenol extraction, the P- 
labeled heptanucleo tides were recovered by precipitation from 70% ethanol. 

The kinase/ligase reactions were repeated a third time to obtain CpCpCp- 
XpCpCpApUpGpCp . The heptanucleo tide ApCpCpApUpGpCp , GpCpCpApUpGpCp or CpCpCp- 
ApUpGpCp was first phosphorylated in a standard kinase reaction with nonradio- 
active ATP. In the subsequent ligase reaction, pXpCpCpApUpGpCp (X = C, A or G ) 
served as the donor, with CpCpC (0.6 OD 2 60 units/25 pi reaction) as acceptor. 

The size and purity of various oligonucleotides was checked by homochro-. 
matography on DEAE thin layer plates at 60°C using homomixture c (19) . Auto- 
radiography was carried out at room temperature using Kodak BB-1 film. 
Synthesis of oligonucleotides varying in position +4 

32 P-Labeled pentanucleo tides of the form ApApUpGpX (where X - C, A, G or U) 
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were synthesized by ligating p*GpX to ApApU. The S'-phosphorylated dinucleo- 
tide donor was first prepared in a reaction with polynucleotide kinase and 

32 * 

y- P-ATP. Reaction conditions were as described above for synthesis of pCp, 
except that 3* -CMP was replaced with either GpC, GpA, GpG or GpU, each at a 
concentration of 10 vg/25 pi reaction. After the standard boiling procedure 
to inactivate kinase, the solution containing [5'- 32 p]pGpX was used in a ligase 
reaction, with ApApU (0.6 units/25 ul reaction) as acceptor. These ligase re- 
actions were incomplete, and therefore at the end of the incubation the 32 P- 
labeled pentanucleotides were separated from unreacted donor by preparative 
electrophoresis on 3MM paper. The pentanucleotides were e luted with water, 
and a portion of each sample was used directly for ribosome binding. Another 
aliquot was used as acceptor in a second ligase step, with nonradioactive pCp 
as donor. This generated a series of hexa nucleotides of the form ApApUpGpXpCp , 
where X 13 C, A, G or U, 

The structure of the pentanucleotides was confirmed by a series of enzy- 
matic digestions: PI nuclease yielded 5'- 32 P-GKPf T2 RNase yielded 3*- 32 P-UMP; 
pancreatic RNase yielded a 3 ^P-labeled product that co-migrated with ApApUp on 
DEAE paper at pH 3.5. The mobilities of the intact dinucleotides and penta- 
nucleotides on 3MM paper at pH 3.5 (see Pig. 2) are consistent with their com- 
position; that is, the mobilities follow the expected gradient U>G>A>C. 
Binding of oligonucleotides to wheat germ ribosome s 

The ability of 32 P-labeled oligonucleotides to form 80S initiation com- 
plexes was measured using conditions that were previously employed with natural 
messenger KNAs (20) . The wheat germ S23 extract was supplemented with 1 mH ATP, 
0.24 mM GTP, 200 uM sparsomycin, and other components as described (20). The 
final concentration of magnesium was 2.8 mM. After incubation for 10 min at 
19°C, samples were chilled and iayered onto 10-30% glycerol gradients. Centri- 
fugation was for 3 hr, 39,000 rpm, 4°c in the Beckman SW41 rotor. Gradient 
fractions (0.4 ml) were mixed with 0.4 ml of water and 8 ml of Beckman HP scin- 
tillation cocktail, and were counted to determine the distribution of radio- 
activity. 

RESULTS 

A survey of nucleotides flanking the initiator codon in eukaryotic mRNAs 

If nucleotides bordering the AUG codon are involved in defining the ribo- 
some binding site, comparison of a large number of initiation sites might re- 
veal a conserved sequence. The simplest approach is to examine the frequency 
of occurrence of a given nucleotide in each position to the -left and right of 
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the AUG codon. Figure 1 presents such an analysis for 106 eukaryotic messages. 
The region selected for analysis encompasses IS nucleotides preceding the 
initiator codon, and the first 15 nucleotides of the coding region. This is 
the portion of the message that would be protected against nuclease by an 60S 
ribosome bound at the initiator AUG. There are a few remarkable features in 
the nucleotide distribution shown in Figure 1. within the noncoding region 
(residues -1 to -15) , the greatest disproportion occurs close to the AUG trip- 
let: 80% of eukaryotic messages have an A in position -3, and C residues occur 
with high frequency in positions -1, -2, -4 and -5. The G content is unusually 
low throughout region -1 to -15. In the coding region depicted in Figure 1, 
the nucleotide distribution is approximately random except for positions +4 
{60% G) , +5 (44% C) and +6 (42% U) . The significance of these asymmetries is 
not immediately obvious, nor is the problem easily approached experimentally. 
To simplify the task, I have focused in this study on the two positions which 
show the greatest deviation: position -3, which is a purine (most often A) in 
94% of the messages examined; and position +4, which is also a purine (most 
often G) in 83% of eukaryotic messages. 

In attempting to assign a function to these conserved nucleotides, one is 
confronted by the problem that some messages (albeit a minority) lack the 
"consensus sequence." One solution that comes to mind is that, in order to 



Figure 1, Frequency of occurrence of 
cytosine (C) , adenine. (A) , guanine (G) 
and uracil (U) in positions -1 to -15 
(preceding the AUG initiator codon), 
and in coding positions +4 to +15. 
The AUG triplet is numbered +1 to +3. 
The dotted line across each panel indi- 
cates the 25% value that would be ex- 
pected on a random basis. For these 
calculations, I used 90 eukaryotic mes- 
sages tabulated in reference 17, and 16 
additional sequences given in refer- 
ences 21-37. I did not include the 
human 3- and 6-globin sequences, which 
are very similar to rabbit 6-globinj 
the minor species of mouse 8-globin, 
which is nearly identical to mouse $- 
globin major; and sequences from the 
New Jersey strain of vesicular stomati- 
tis virus, which are similar to the 
Indiana strain. SV40 late 16S mRNA was 
also omitted because of ambiguity in 
identifying the initiator codon for VPl. 
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function as an efficient initiation signal, an AUG codon must be flanked by 
either a purine in position -3, or a G in position +4. In other words, at 
least one of the favored flanking nucleotides, is required. To evaluate this 
idea, I have surveyed and grouped 153 eukaryotic messages on the basis of 
nucleotides occurring in positions -3 and +4. The data are presented in Table 
1. (I attempted to include all eukaryotic messages for which adequate sequence 
data have been published, and in which the functional initiator codon has been 
identified. Although sequences bordering the AUG codon have been determined 
for many messages, in some cases the remainder of the 5* -untranslated sequence 
is not known. Thus, the number of mRNAs used in compiling Table 1 is larger 
than the number used in Figure 1. In the case of closely related genes, only 
one member of the set was chosen for Figure 1, whereas Table 1 is more inclu- 
sive.) Of the 153 messages listed in Table 1, 120 have an A in position -3. 
The largest group (64 .messages) has the sequence AXXAUGG, but AXXAUGA (32 mRNAs) 
and AXXAUGY (24 mRNAs) also occur with high frequency at functional initiation 
sites. The table lists 15 messages in which the initiation site has the se- 
quence GXXAUGG, and another 4 with the sequence GXXAUGA , Thus, in 91% of the 
messages surveyed (139 out of 153) , one finds either an A in position -3 (with 
G, A or Y in position +4) , or one finds a G in position -3 and a purine in 
position +4. Judging from their high frequency of occurrence, those sequences 
must function well. The infrequency of finding GXXAUGY or YXXAUGX, on the 
other hand, might mean that an AUG triplet flanked by those nucleotides func- 
tions poorly as an initiation signal. 

It is intriguing to note that, in the eleven unusual messages in which 
translation does not begin at the first AUG, the pattern of nucleotides flank- 
ing the nonfunctional upstream AUG triplets (right side of Table 1) is differ- 
ent from that found at most functional initiation sites. The nonfunctional 
AUG triplets which occur within the S'-noncoding regions are clustered in the 
lower right quadrant of Table 1; i.e., they are bordered by nucleotides which 
are rarely seen around functional initiator codons. The only serious excep- . 
tions are SV40 16S and 19S mRNAs, in which the upstream AUG triplets in the 
leader are flanked by GXXAUGG — a sequence frequently found at functional ini- 
tiation sites. But the first AUG in SV40 16S mRKA really should not be listed 
with the "nonfunctional" AUGs in the right side of Table 1. It was recently 
shown that ribosomes do initiate at that site, translating the so-called agno- 
gene (67) . The mechanism by which ribosomes are also able to initiate at a 
downstream' AUG triplet to make the VP1 protein is not understood; but the ideas 
proposed in this paper, are not contradicted by finding "good" sequences flank- 
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ing the first AUG in SV40 16S mRNA. The upstream AUG in that message is func- 
tional. The only other entry in the GXXAUGG column is the sixth AUG in the 
poliovirus genome. As explained in the footnote to Table 1, two different 
sequences vere reported for that site. Until the ambiguity can be cleared up, 
it seems fair to omit AUG #6 from further discussion. With these caveats, the 
data presented in Table 1 suggest the generalization that nonfunctional AUG 
triplets, which are found in the S'-noncoding region of a few eukaryotic mes- 
sages, are bordered by sequences (GXXAUGY or YXXAUGX) which differ from those 
found around most functional initiator codons. 

The main conclusion from this survey is that nucleotides flanking func- 
tional initiator codons in eukaryotic messages are not random. Purines occur 
with very high frequency in positions -3 and +4. As a first step in determin- 
ing whether the conserved flanking nucleotides play a role during initiation, 
I have carried out in vitro ribosome binding studies using various synthetic 
oligonucleotides . 

Binding of 3 2 P- labeled oligonucleotides to wheat germ ribo somes 
Effect of varying the nucleotide in position +4 

Oligonucleotides of the form ApApUpGpX (where X = C, A, G or U) were con- 
structed by ligating pGpX to the triplet ApApU. The efficiency of ligation 
was approximately 40% for reactions with pGpC and pGpA, and somewhat less than 
that for reactions with p*GpG and p'GpU, as shown in Figure 2. The 32 P-labeled 
pentanucleot ides , purified by electrophoretic fractionation, were incubated with 
wheat germ ribosome s under conditions that permitted formation of BOS initiation 
complexes. As shown in Table 2 (experiments 1 and 2), the efficiency of binding 
varied from 0.5% for ApApUp^GpU, to 7-10% for ApApUpGpG. 

The pentanucleot ide series was converted to hexanucleo tides by ligating pCp 
to the 3* -terminus. The chromatographic analysis shown in Figure 3 reveals that 
the ligation was quantitative: all 32 P-radioactivity was converted to the 
slower-migrating hexanucleo tide position. The overall efficiency of binding to 
ribo somes was higher with the hexanucleotide series than with the pentnucleo- 
tides. Table 2 (experiment 3) shows that, within the hexanucleotide series 
ApApUpGpXpCp, varying the nucleotide in position +4 produced a gradient in 
binding efficiency: G>A>C>U. A similar gradient was not observed with the 
control series ApApUpGpOpXp (X » C, A, G or U; experiment 4). Thus, the effi- 
ciency of binding was markedly enhanced by placing a purine in position +4, 
but not in position +5. 



6241 



175 



Nucleic Acids Research 




Figure 2. Synthesis of ApApUpGpX 
series of oligonucleotides. Odd- 
numbered lanes show the 32p-iabeled 
dinucleotides formed in the fir j£" 
step kinase reactions: lane 1, pGpC; 
lane 3, pGpA; lane 5, $GpGi lane 7, 
pGpU. The even-numbered lanes show 
the products of the second-step lig- 
ase reactions, with nonradioactive 
ApApU as acceptor: lane 2, ApApUpQpCj 
lane 4, ApApUpGpAi lane 6, ApApUpGpGi 
lane 8, ApApUpGpU. The pentanucleo- 
tide is the slower spot in lanes 2, 
4, 6 and 8; the faster spot in each 
lane represents residual unligated 
dinucleotide. Products were frac- 
tionated by electrophoresis on What- 
man 3MM paper at pH 3.5. Ah auto- 
radiogram is shown. XC - xylene 
cyanol marker. The origin is at 
the bottom. 



TABLE 2. Binding of 32 P-labeled oligonucleotides to wheat germ 80S ribosooes: 
effect of varying the nucleotide in position +4 or +5 



Experi- 32 P-labeled Percent 
ment pentnucleotide bound 3 


• Experi- 32p_i a i > eied Percent 
ment hexanucleotide bound a 


1 ApApUpGpC 3 
ApApUpGpA 4 
ApApUpGpG 7 
ApApUfGpU 0.5 

2 ApApUpGpU 0.5 
ApApUpGptS 9-5 
ApGpUpGpG (control) 0 


3 ApApUpGpCpCp 12 
ApApUpGpApCp 18 
ApApUpGpGpCp 23 
ApApU$GpUpCp 6 

4 ApGpUpGpGpCp (control) 0.3 
ApApu|GpGpCp 24 
ApApU?GpUp9p 5 
ApApU^GpOpAp 1 
ApApufiGpUpGp 3 
ApApUpGpUpUp 1 



a After incubation for 10 min at 19°C, each 50 ul reaction mixture was cen 
fuged through a glycerol gradient. The 3 2 P-radioactivity co-sedimenting < 
80S ribosomes is ahown as a percent of the total radioactivity recovered, 
sample contained 30,000 cpm of 3 ^-oligonucleotide. 
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Figure 3. Addition of pCp to the 3 '-end 
of the pentanucleotide series ApApUpGpX. 
The 32 P-labeled pentanucleotides ApApUpGpC, 
ApApUpGpA, ApApUpGpG and ApApUp^GpU are 
shown in lanes 1, 3, 5 and 7, respectively. 
The products obtained after ligation to 
(nonradioactive) pCp are shown in lanes 
2, 4, 6 and 8. Fractionation was by homo- 
chroma tog raphy on a DEAE-cellulose thin 
layer plate. The origin is at the bottom. 



Effect of varying the nucleotide in position -3 

The range of oligonucleotides that could be constructed for this study 
was limited by availability of the required trinucleotide precursors and, more 
importantly, by the specificity of RNA ligase. As reported previously (79,80), 
the enzyme has marked sequence preferences with respect to both donor and 
acceptor molecules. Molecules with 5»-terminal cytidine are, by far, the most 
efficient donors in reactions catalyzed by T4 RNA ligase. The best trinucleo- 
tide acceptor is ApApA; ApCpC functions adequately as acceptor if the enzyme 
and substrate concentrations are high; ApUpU and UpUpU were very inefficient 
in preliminary experiments , and therefore they were abandoned. Acceptor acti- 
vity depends not only on the 3' -terminal residue, but also on the adjacent 
nucleotides . 

To test the idea that a purine in position -3 might enhance binding of 
AUG-containing oligonucleotides, I first constructed a series of heptanucleo- 
tides of the form XpCpCpApUpGpCp , where X 3 C, A or G. These were prepared 
by ligating 32 P-labeled pApUpGpCp to either CpCpC, ApCpC or GpCpC. Figure 4 
(lanes 2-4) shows that all of the 32 P-labeled donor was converted to slower- 
migrating heptanucleotides. When these oligonucleotides were tested for abili- 
ty to bind to wheat germ ribosomes, ApCpCpApUpGpCp and GpCpCpApUpGpCp were 
slightly more efficient than CpCpCpApUpGpCp , as shown in Table 3. 

It seemed possible that the stabilizing effect of a purine in position -3 
might be more obvious if the purine were not right at the end of the oligonu- 
cleotide. To obtain longer templates, I ligated pXpCpCpApUpGpCp <X = C, AorG) 
to CpCpC. The reactions with pCpCpCpApUpGpCp and pApCpCpApOpGpCp proceeded to- 
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Figure 4. Sequential joining of 32 P-labeled 
pApUpGpCp to XpCpC (X = G, A or C) » and then 
to CpCpC. Lane 1, ApUpGpC; lane 2, GpCpCp- 
ApOpGjCj lane 3, ApCpCp ApUpGpC J lane 4, CpCpCp- 
ApUpG^C; lane 5, CpCpQpGpCpCpApOpGpC (slower 
spot) and residual GpCpCpApUpGpC; lane ^6, 
CpCpCpApCpCpApUpGpCf lane 7, (CpJgApUpGpC. 
The reactions catalyzed by UNA ligase are de- 
scribed in Materials and Methods. Fractiona- 
tion was by homochromatography on a DEAE-cellu- 
lose thin layer plate. The origin is at the 
bottom. To enhance the resolution, terminal 
phosphates were removed by incubating all oligo- 
nucleotides with bacterial alkaline phosphatase 
prior to chromatography. 



completion, as shown in Figure 4, lanes 6 and 7. With pGpqpCpApDpGpCp as donor, 
however, the extent of joining was only 40% after the first ligation (Figure 4 , 
lane 5). The reaction with pGpCpCpApUpGpCp was complete after a second incuba- 
tion with RNA ligase (data not shown) . The structure of the oligonucleotides 
(Gp) 6 Ap0pGjcp and (Cp) 3 ApCpCpApUpGpCp was confirmed by analyzing the partial T2 
RNase digestion products (Figure 5) . Ribosome binding experiments were car- 
ried out with (Cp) 6 ApU P GpCp, (CpJ-ApCpCpApUpGpCp and (Cp^GpCpCpApUp^. As 
shown in Table 3, the extent of binding was 7 to 9-fold higher for oligonucleo- 
tides with a purine in position -3. Although this effect was readily demon- 



TABLE 3. 



Binding of 32 p _ labele d oligonucleotides to wheat germ 80S ribosomes: 



32 P-labeled oligonucleotide 


Percent of 
Experiment 1 


input oligonucleo 
Experiment 2 


:ide bound 3 
Experiment 3 


ApUpG^Cp 
CpCpCpApUpGpCp 
ApCpCpApUpGpCp 
GpCpCpApUpGpCp 

ApCpCpApUpUpCp 
ApCpCpGpUpG^Cp 

CpCpCpCpCpCpApUpG^Cp 
CpCpCpApCpCpApUpGpCp 
CpCpCpGpCpQpApPpGpCp 


3 
3 
5 
6 


n.d. 
3 
5 
6 

1 
9 

n.d. 


n.d. 

3 

5 
n.d. 

0.2 
0.2 

1 
7 
9 



"The experiments were cameo out &a ucatii^u * 

Three independent preparations of oligonucleotides were tested in experiments 
1, 2 and 3. n.d. - not done. 
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fr^^pUpGpv" 

^^pc^Apy?6p 

PiVjp.-H'N.C Ms..-- 



Figure 5. Autoradiogram of products derived by 
partial hydrolysis of (Cp) 6 ApUpGf (lanes 1,2) and 
(Cp) 3 ApCpCpApUpGS (lanes 4,5), Samples were incu- 
bated with T2 RNase (5U/ml) for 15 min (lanes 1/4) 
or 45 min (lanes 2,5). Partial digestion products 
were fractionated by homochromatography on DEAE- 
cellulose. The uniform spacing between spots 3 • 
through 9 in lane 2 is as expected for (CpJgApOpGp. 
In lanes 4 and 5, the larger shift between spot 5 
(CpCpApUpGS) and spot 6 (ApCpCpApUpG?) is consist- 
ent with the proposed sequence. [In a homologous 
partial digestion series, the distance between 
two oligonucleotides which differ by a purine 
nucleotide is always larger than that between two 
oligonucleotides which differ by a pyrimidine ^ 
nucleotide (81). 1 The markers in lane 3 are Gp, 
ApUpGp and (Cp) e ApUpGp t . Prior to carrying out the 
analysis shown in this figure, the oligonucleotides 
were digested with T A FNaBe to remove 3* -terminal 
Cp, leaving. 32 P directly at the 3' -end of the T x - 
derived product. This simplified identification 
of partial digestion products subsequently ob- 
tained with T2 RNase. 



strated using the small oligonucleotides described above, with longer tem- 
plates, such as (Cp) l2 ApUpGpCp, it was more difficult to show dependence on a 
purine in position -3. The problem is that longer oligonucleotides bound with 
considerably higher efficiency (25 to 45%) seemingly due to just their length. 
Not surprisingly, it is difficult to demonstrate a stabilizing effect due to 
changing one nucleotide when binding has already been dramatically enhanced by 
the length-effect. 

The controls listed in Tables 2 and 3 indicate that, under the conditions 
of these experiments, only AUG- containing oligonucleotides were able to bind 
to ribosomes. A pGpUpGp G and A pGpUpGp GpCp (Table 2) , as well as ApCfcCpGpUpGpCp 
and ApCpC pApUpUp Cp (Table 3) showed negligible binding. 

DISCUSSION 

A survey of sequences flanking the initiator codon in eukaryotic messenger 
RNAs reveals that almost all functional AUG triplets are preceded by a purine 
(usually A) in position -3. A high proportion of functional initiation sites 
also have a purine (usually G) following the AUG codon. From the data summar- 
ized in Table 1, ^XXAUGG emerges as the favored sequence for eukaryotic initia- 
tion sites. Of the 153 messages included in the survey, only 11 have a pyrimi- 
dine in position -3, and in 9 of those the AUG codon is followed by G. Thus, 
only 2 putative initiation sites 2 ~those of SV40 VPl and brome mosaic virus 
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RNA-3—lack both a purine in position -3 and a G residue in position +4. The 
sequence §XXAUGG which characterizes most functional initiation sites is never 
observed among the nonfunctional AUGs found in the 5'-noncoding region of 
eukaryotic messages. It is obvious from the data-in Table 1, however, that 
differences in sequences flanking the AUG triplet do not categorically distin- 
guish functional from nonfunctional sites. Although most functional initiator 
codons are preceded by a purine in position -3 and most nonfunctional AUGs have 
a pyrimidine in that position, the sequences GXXAUGY and YXXAUGG occur at a 
small number of functional sites as well as at (presumably) nonfunctional AUGs 
within the leader region of poliovirus and Rous sarcoma virus RNAs. A mechan- 
ism of initiation compatible with this nonunique distribution is outlined be- 
low. The main conclusions from the survey presented in Table 1 are (a) that 
nucleotides flanking functional initiator codons (particularly in positions -3 
and +4) are not random; and (b) that nucleotides flanking nonfunctional AUG 
triplets which occur within the 5* -untranslated region of a few eukaryotic mes- 
sages are different from those bordering most functional initiator codons. 

This asymmetry suggests that purines in positions -3 and +4 might facili- 
tate recognition of the AUG codon during formation of initiation complexes. 
The idea gains support from the oligonucleotide binding studies described above. 
The extent of binding to wheat germ ribosomes was increased several- fold by 
placing a purine, rather than a pyrimidine, in position +4 (Table 2) . The 
facilitating effect of a purine in position -3 was also readily demonstrated 
(Table 3) , particularly with the series CpCpCpXpCpCpApUpGpCp (X = C, A or G) . 
Since only a small number of permutations were tested in this study, it might 
be that nucleotides in positions other than those tested also contribute to the 
stability of initiation complexes. But it is encouraging that differences in 
oligonucleotide binding can be detected upon varying the component in position 
-3 or +4; a purine seems to be preferred in both positions. 

These results cannot be interpreted in isolation. Although the data in 
this paper suggest that recognition of the initiator codon is influenced by the 
flanking sequences, the position of the AUG triplet (i.e., near the 5' -end of 
the message) still seems to be the primary determinant of a functional initia- 
tion site (16,17). Within the interior of eukaryotic messenger RNAs the se- 
quence JxXAUGG occurs many times; ribosomes do not initiate at these internal 
AUGs despite the favorable flanking sequence. In apparent contradiction to the 
studies described above, some previous experiments seemed to indicate that 
eukaryotic ribosomes select the AUG initiator codon without regard to the flank- 
ing sequences. Sherman and colleagues (9) showed, for example, that when the 
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normal initiator codon in the yeast cytochrome c gene was inactivated by muta- 
tion, introduction of a new AUG triplet almost anywhere within a 37-nucleotide 
region restored translation. The sequences flanking the ectopic AUG initiator 
codons in the pseudor evert ants varied widely, and often did not correspond to 
the optimal sequences defined above. How can those experiments be reconciled 
with the new data described herein? The interpretation I favor is a modified 
version of the scanning mechanism; namely, that flanking sequences (nucleotides 
-3 and +4) modulate the efficiency with which the migrating 40S ribosomal sub- 
unit recognizes an AUG triplet as a "stop signal," Some 4 OS subunits will stop 
at the first AUG irrespective of the flanking sequences; if the nucleotides 
bordering an AUG codon are optimal, virtually all 40S subunits will stop at 
that AUG. Sherman's data are compatible with such a mechanism: some cytochrome 
c is made in the pseudorevertants , indicating (only) that some 40S subunits 
stop at the first AUG, even when it is flanked by suboptimal sequences. Those 
experiments do not indicate that the ectopic AUG triplets in the pseudorevert- 
ants function as efficiently as the wild-type sequence AUAAUGA presumably 
does. Thus, the genetic experiments are not incompatible with a modified scan- 
ning mechanism in which flanking sequences 3 affect the efficiency with which an 
AUG codon is recognized as a "stop signal." 

The proposed mechanism has some interesting implications: 

(a) The scanning model in its simplest form (see Introduction) predicts that 
spurious AUG triplets cannot occur in the region preceding the functional ini- 
tiation site; recognition of the authentic initiator codon depends on its being 
first-in-line. This prediction is upheld by most, but not all, eukaryotic mes- 
sages. The modified scanning model, on the other hand, admits that ribo somes 
can initiate at a downstream site , provided that all of the upstream AUGs are 
flanked by "unfavorable" sequences such that some 40S ribosomes can get through. 
The data in the right side of Table 1 provide encouragement for this idea. In 
those messages in which translation does not begin at the first AUG, almost 
all of the AUGs which are bypassed have a pyrimidine in position -3. 

(b) Although upstream AUGs are not an absolute barrier to initiating down- 
stream, it seems reasonable to expect that occurrence of AUG triplets within 
the 5'-noncoding region of a message would reduce translational efficiency , 
since some 40S ribosomes would stop at each upstream AUG irrespective of the 
flanking sequences. Consistent with this idea, poliovirus RNA is a notoriously 
inefficient message in vitro (84) , as is the Rous sarcoma virus genome (85) . 
The notion that upstream AUGs impair translational efficiency would explain why 
90% of eukaryotic mRNAs have no AUGs in the S'-noncoding region. Even if the 
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flanking sequences are such that some ribosomes can get through, upstream AUGs • 
probably have a deleterious effect. Parenthetically, the idea that some 40S 
subunits stop and initiate at each upstream ADG rationalizes the finding that, 
in those few messages which have AUG triplets in the "S'-noncoding region" 
(the semantic difficulty is obvious) , the upstream AUGs are almost always fol- 
lowed closely by in-phase terminator codons (17). Thus, ribosomes which ini- 
tiate prematurely at an upstream AUG are returned quickly to circulation. 
(c) The apparent inability of eukaryotic ribosomes to bind to sites in the 
interior of a message generally means that a eukaryotic mRNA can direct syn- 
thesis of only one protein — that encoded nearest the 5' -end of the template. 
But a single message might direct synthesis of two proteins if the AUG triplet 
at the start of the first coding region were flanked by unfavorable sequences, 
such that only some 4 OS subunits stop and initiate at that site while others 
advance to the next AUG. Two examples come to mind: SV40 late 19S mRNA is 
believed to direct synthesis of both VP 2 and VP 3 (75, 86); and the mRNA encod- 
ing herpes thymidine kinase has been shown to direct synthesis of a second 
smaller protein (87). In both messages the upstream initiator codon, which 
seems to be "leaky," is preceded by a pyrimidirie in position -3: the initia- 
tion site for SV40 VP2 is UCCAUGG, and the putative initiation site for herpes 
thymidine kinase is CGUAUGG. 

In summary, it seems likely that nucleotides in positions -3 and +4 in- 
fluence recognition of the AUG codon. by eukaryotic ribosomes , or one of the 
ribo some- associated components involved in initiation. Binding of synthetic 
oligonucleotides to wheat germ ribosomes was enhanced 5- to 15-fold by placing 
a purine in either of those positions. This preference mirrors the observed 
frequency of nucleotides flanking the initiator codon in natural mPNAs. 
Although there is no direct evidence concerning the mechanism by which nucleo- 
tides bordering the AUG codon facilitate initiation, the mechanism must be 
compatible with a large body of evidence which suggests that ribosomes attach 
to eukaryotic mRNAs at an upstream site, and migrate down to the AUG. I have 
therefore proposed a modified version of the scanning model which postulates 
that flanking nucleotides modulate the efficiency with which an AUG triplet 
is recognized as a stop-signal by the migrating 40S subunit. The modified 
scanning mechanism accounts for those few messages (eleven, to date) in which 
translation does not begin at the AUG triplet closest to the 5' -terminus, and 
also for rare messages which seem to direct synthesis of two independently- 
initiated proteins. 
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FOOTNOTES 

2 The suggested cut-off between efficient and inefficient initiation sig- 
nals has been set rather arbitrarily between GXXADGA (which I have called effi- 
cient) and GXXAUGY (which I have called inefficient) . This division obviously 
does not follow from the number of functional initiation sites which have those 
sequences. Both sequences occur infrequently, and therefore both might be 
viewed as likely-to-be-inefficient. On the other hand, there is no evidence 
that the 4 messages initiating at GXXADGA or the 3 messages initiating at GXX- 
AUGY (see Table 1) are defective) thus, one might argue that — despite their in- 
frequent occurrence — both GXXADGA and GXXADGY should be regarded as efficient. 
To some extent, 1 have been guided by the oligonucleotide binding studies which 
show a purine in position +4 to be much better than a pyrimidine in that site. 

2 There is some uncertainty in pinpointing the initiator codon in these 2 
messages. In the case of brome mosaic virus RHA-3, the putative initiation 
site was identified by the open reading frame that follows t but only a limited 
portion of that RNA has been sequenced (B2) . The difficulty in identifying 
the initiator codon for SV40 VPl has been discussed previously (17) . It is not 
known whether translation of VPl in vivo begins at the first or second AUG 
within the sequence CUDADGAAGADGGCC . 

^Although this paper emphasizes the role of the flanking priinary sequence 
in modulating recognition of the AUG codon, other experiments (83) suggest 
that the secondary /tertiary structure of eukaryotic messages also contributes 
to the fidelity of initiation. With denatured reovirus mRNA as template, 40S 
ribosomes tend to migrate beyond the first ADG codon — despite the favorable 
flanking sequences. 



REFERENCES 

1. Steitz, J. A. (1979) in Biological Regulation and Development, Goldberger, 

R.F., Ed., pp. 349-399, Plenum Press, New York. 

2. Borisova, G.P., Volkova, T.M., Berzin, V., Rosenthal, G. and Gren, E.J. 

(1979) Nucleic Acids Res. 6, 1761-1774. 

3. Cannistraro, V.J. and Kennell, D. (1979) Nature 277, 407-409. 

4. Fiil, N.P., Friesen, J.D., Downing, W.L. and Dennis, P.P. (1980) Cell 19, 

837-844. 

5. Taniguchi, T. and Weissmann, C. (1978) J. Hoi. Biol. 118, 533-565. 

6. Atkins, J.F., Steitz, J.A. , Anderson, C.W. and Model, P. (1979) Cell 18, 

247-256. 

7. Dunn, J.J., Buzash-Pollert, E. and Studier, F.w. (1978) Proc. Natl. Acad. 

Sci. OSA 75, 2741-2745. 

8. Schwartz, M. , Roa, H. and Debarbouille, H. (1981) Proc. Natl. Acad. Sci. 

DSA 78, 2937-2941. . 

9. Sherman, F. , Stewart, J.W. and Schveingruber , A.M. (1980) Cell 20, 215-222. 

10. Ghosh, P.K. , Lebowitz, P., Frisque, R.J. and Gluzman, Y. (1981) Proc. Natl. 

Acad. Sci. USA 78, 100-104. . 

11. Barkan, A. and Mertz, J.E. (1981) J. Virol. 37, 730-737. 

12. Solnick, D. (1981) Cell 24, 135-143. 



6249 



183 



I 



Nucleic Acids Research 



13. Firtel, R.A. , Timm, R. , Kirarael, A.R. and McKeown, M. (1979) Proc. Natl. 

Acad. Sci. USA 76, 6206-6210. 

14. Montgomery, D.L. , Leung, D.W. , Smith, M. , Shalit, P., Paye, G. and Hall, 

B.D. (1980) Proc. Natl. Acad. Sci. USA 77, 541-545. 

15. Jones, C.W. and Kafatos, F.C. (1980) Cell 22, 855-867. 

16. KozaX, H. (1980) Cell 22, 7-8. 

17. Kozak, M. (1981) in Current Topics in Microbiology and Immunology, 

Shatkin, A.J. , Ed., Vol. 93, pp. 81-123, Springer-Verlag, Berlin. 

18. Bruce, A.G. and Uhlenbeck, O.C. (1978) Nucleic Acids Res. 5, 3665-3677. 

19. Barrell, B.G. (1971) in Procedures in Nucleic Acid Research, Cantoni, G.L, 

and Davies, D.R. , Eds. Vol.2, pp. 751-779. 

20. Kozak, M. and Shatkin, A.J. (1976) J. Biol. Chem. 251, 4259-4266. 

21. Sakano, H. , Maki, R. , Kurosawa, Y. , Roeder, W. , and Tonegawa, S. (1980) 

Nature 286, 676-683. 

22. Early, P., Huang, H. , Davis, M. , Calame, K. and Hood, L. (1980) Cell 19, 

981-992. 

23. Both well, A.L.M. , Paskind, M. , Reth, M. , Imanishi-Kari, T., Rajewsky, K. 

and Baltimore, D. (1981) Cell 24, 625-637. 

24. Nishioka, Y. and Leder, P. (1980) J. Biol. Chem. 255, 3691-3694. 

25. Tonegawa, S., Maxam, A.M., Tizard, R. , Bernard, 0. and Gilbert, W. (1978) 

Proc. Natl. Acad. Sci. USA 75, 1485-1489. 

26. Kitamura, N., Semler, B.L. , Rothberg, P.G. , Larsen, G.R. , Adler, C.J., 

Corner, A.J. , Emini, E.A. , Hanecak, R. , Lee, J.J. , van der Herf, S., 
Anderson, C.W. and Wiromer, E. (1981) Nature 291, 547-553. 

27. Racaniello, V.R. and Baltimore, D, (1981) Proc. Natl. Acad. Sci. USA, 

in press. 

28. Czernilofsky, A. P. r Levinson, A.D. , Varmus, H.E., Bishop, J.M. , Tischer, E. 

and Goodman, H.M. (1980) Nature 287, 198-203. 

29. Rice, CM. and Strauss, J.H. (1981) Proc. Natl. Acad. Sci. USA 78, 2062- 

2066. 

30. Garoff, H. , Frischauf, A.M., Simons, K., Lehrach, H. and Delius, H. (1980) 

Proc. Natl. Acad. Sci. USA 77, 6376-6380. 

31. Valenzuela, P., Gray, P., Quiroga, M. , Zaldivar, J. , Goodman, H.M. and 

Rutter, W.J. (1979) Nature 280, 815-819, 

32. Fiddes, J.C. and Goodman, H.M. (1980) Nature 286, 684-687. 

33. Hudson, P., Haley, J., Cronk, M. , Shine, J. and Niall, H. (1981) Nature 

291, 127-131. 

34. Law, S.W. and Dugaiczyk, A. (1981) Nature 291, 201-205. 

35. Nakanishi, S. , Teranishi, Y. , Watanabe, Y. , Notake, M. , Noda, M. , Kaki- 

dani, H. , Jingami, H. and Numa, S. (1981) Eur. J. Biochem. 115, 429-438. 

36. Wieringa, B., Geert, A.B. and Gruber, M. (1981) Nucleic Acids Res. 9, 

489-501. 

37. Heintz, N. , Zernik, M. and Roeder, R.G. (1981) Cell 24, 661-668. 

3B. Richards, R.I. and Wells, J.R.E. (1980) J. Biol. Chem. 255, 9306-9311. 

39. Friedmann, T. , LaPorte, P. and Esty, A. (1978) J. Biol. Chem. 253, 6561- 

6567. 

40. Yang, R.C.A. and Wu, R. (1979) Virology 92, 340-352, ' 

41. Goeddel, D.V. r Leung, D.W. , Dull, T.J. , Gross, M. , Lawn, R.M. , McCandliss, 

R. , Seeburg, P.H., Ullrich, A., Yelverton, E. and Gray, P.w. (1981) 
Nature 290, 20-26. 

42. McXeown, M. and Firtel, R.A. (1981) Cell 24, 799-807. 

43. Farabaugh, P.J. and Fink, G.R. (1980) Nature 286, 352-356. 

44. Holland, M.J. , Holland, J. P., Thill, G.P. and Jackson, K.A. (1981) J. Biol. 

Chem. 256, 1385-1395. 

45. Scarpulla, R.C., Agne, K.M. and Wu, R. (1981) J. Biol. Chem. 256, 6480- 

6486. 



5250 



184 



Nucleic Acids Research 



46. Matthyssens, G. and Rabbitts, T.H. (1980) Proc. Natl. Acad. Sci. USA 77, 

6561- 6565. 

47. Akusjarvi, G. and Persson, H. (1981) J. Virol. 38, 469-482. 

48. Van Rompuy, L., Mln Jou, W. , Huylebroeck, D. , Devos, R. and Fiers, W. 

(1981) Eur. J. Biochem. 116, 347-353. 

49. Air, G.M. (1979) Virology 97, 468-472. 

50. Drickamer, K. , Kwoh, T.J. and Kurtz, D.T. (1981) J. Biol. Chera. 256, 

3634-3636. 

51. Cooke, N.E. , Coit, D. , Shine, J., Baxter, J.D. and Martial, J. A. (1981) 

J. Biol. Chem. 256, 4007-4016. 

52. Lin, Y. and Gross, J.K. (1981) Proc. Natl. Acad. Sci. USA 78, 2825-2829. 

53. Herisse', J. and Galibert, F. (1981) Nucleic Acids Res. 9, 1229-1240. 

54. Herisse, J., Courtois, G. and Galibert, F. (1980) Nucleic Acids Res. 8, 

2173-2192. 

55. Blok, J. and Air, G.M. (1980) Virology 107, 50-60. 

56. Fields, S., Winter, G. and Brownlee, G.G. (1981) Nature 290, 213-217. 

57. Min Jou, W. , Verhoeyen, M. , Devos, R, , Saman, E. , Fang, R. , Huylebroeck, D. , 

Fiers, W., Threlfall, G., Barber, C. , Carey, N. and Emtage, S. (1980) 
Cell 19, 683-696. 

58. Winter, G. , Fields, S. and Brownlee, G.G. (1981) Nature 292, 72-75. 

59. Porter, A.G., Barber, C. , Carey, N.H., Hallewell, R.A. , Threlfall, G. and 

Emtage, J.S. (1979) Nature 282, 471-477. 

60. Wallis, J.W. , Hereford, L. and Grunstein, M. (1980) Cell 22, 799-805. 

61. Jenkins, J.R. (1979) Nature 279, 809-811. 

62. Fyrberg, E.A. , Bond, B.J., Hershey, N.D. , Mixter, K.S. and Davidson, N. 

(1981) Cell 24, 107-116. 

63. Reddy, E.P., Smith, M.J., Canaani, E. , Robbins, K.C. , Tronick, S.R. , Zain, 

S. and Aaronson, S.A. (1980) Proc. Natl. Acad. Sci. USA 77, 5234-5238. 

64. VanBeveren, C. , Galleshaw, J. A., Jonas, V., Berns, A. J.M. , Doolittle, R.F., 

Donoghue, D.J. and Verroa, I.M. (1981) Nature 289, 258-262. 

65. Williams, J.G., Kay, R.M. and Patient, R.K. (1980) Nucleic Acids Res. 8, 

4247-4258, 

66. Reddy, V.B., Thimmappaya, B., Dhar, R. , Subramanian , K.N. , Zain, B.S., 

Pan, J., Ghosh, P.K. , Celma, M.L. and Weissman, S.M. (1978) Science 200, 
494-502. 

67. Jay, G. , Nomura, S., Anderson, C.W. and Khoury, G. (1981) Nature 291, 346- 

349. 

68. Vogeli, G. , Ohkubo, H. , Sobel, M.E., Yamada, Y. , Pastan, I. and deCrom- 

brugghe, B. (1981) Proc. Natl. Acad. Sci. USA, in press. 

69. Clark, S.J., Krieg, P. A. and Wells, J.R.E. (1981) Nucleic Acids Res. 9, 

1583-1590. 

70. Pinck, M. , Fritsch, C. , Ravelonandro, M. , Thivent, C. and Pinck, L. (1981) 

Nucleic Acids Res. 9, 1087-1100. 

71. Hagenbuchle, .0. , Tosi, M. , Schibler, U. , Bovey, R. , Wellauer, P.K. and 

Young, R.A. (1981) Nature 289, 643-646. 

72. Perler, P., Efstratiadis, A., Loraedico, P., Gilbert, W. , Kolodner, R. and 

Dodgson, J. (1980) Cell 20, 555-566. 

73. Wengler, G. , Wengler, G. and Gross, H.J. (1979) Nature 282, 754-756. 

74. Ghosh, P.K. , Reddy, V.B. , Swinscoe, J., Choudary, P.V, , Lebowitz, P. and 

Weissman, S.M. (1978) J. Biol. Chem. 253, 3643-3647. 

75. Ghosh, P.K. , Reddy, V.B., Swinscoe, J., Lebowitz, P. and Weissman, S.M. 

(1978) J. Mol. Biol. 126, 813-846. 

76. Cordell, B. , Weiss, S.R. , Varmus, H.E. and Bishop, J.M. (1978) Cell 15, 

79-91. 

77. Jacobs, J.W. , Goodman, R.H. , Chin, W.W. , Dee, P.C., Habener, J.F., Bell, 

N.H. and Potts, J.T., Jr. (1981) Science 213, 457-459. 

78. Glanville, N. , Dumam, D.M. and Palmiter, R.D. (1981) Nature 292, 267-269. 



5251 



185 



Nucleic Acids Research 



79. Ohtsuka, E, , Nishikawa, S. , Fukumoto, R, , Tanaka, S., Markham, A.F., 

Ikehara, M. and Sugiura, M. (1977) Eur. J. Biochem. 81, 285-291. 

80. England, T.E. and Ohlenbeck, O.C. (1978) Biochemistry 17, 2069-2076. 

81. Silberklang, M. , Gillum, A.M. and RajBhandary, U.L. (1977) Nucleic Acids 

Res. 4, 4091-4108. 

82. Ahlquist, P., Dasgupta, R. , Shih, D.S., Zimmem, D. and Kaesberg, P. 

(1979) Nature 281, 277-282. 

83. Kozak, M. (1980) Cell 19, 79-90. 

84. Shih, D.S., Shih, C.T. , Kew, O., Pallansch, M. , Rueckert, R. and Kaesberg, 

P. (1978) Proc. Natl. Acad. Sci. USA 75, 5807-5811. 

85. Pawson, T. , Martin, G.S. and Smith, A.E. (1976) J. Virol. 19, 950-967. 

86. Piatak, M. , Ghosh, P.K., Reddy, V.B., Lebowitz, P. and Weissman, S.M. 

(1979) in Extrachromosomal DNA, ICN-UCLA Symposia on Molecular and cellu- 
lar Biology, Cummings, D.J., Borst, P., David, I. B. , 'Weissman, S.M. and 
Pox, C.F., Eds.,Vol. XV, pp. 199-215, Academic Press, New York. 

87. Preston, CM. and McGeoch, D.J. (1981) J. Virol. 38, 593-605. 



5252 



186 



volume 15 Number 20 1987 Nucleic Acids Research 



An analysis of S'-noncodlng sequences from 699 vertebrate messenger RNAs 



Marilyn Kozak 



Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA 



Received July 6, 1987; Revised and Accepted September 23, 1987 



ABSTRACT 

5' -Noncoding sequences have been compiled from 699 vertebrate mRNAs. 
(GCC) GCCftcC ATGG emerges as the consensus sequence for initiation of transla- 
tion in vertebrates . The most highly conserved position in that motif is the 
purine in position -3 (three nucleotides upstream from the ATG codon) ; 97% of 
vertebrate mRNAs have a purine, most often A, in that position. The periodi- 
cal occurrence of G (in positions -3, -6, -9) is discussed. Upstream ATG 
codons occur in fewer than 10% of vertebrate mRNAs -at-largej a notable excep- 
tion are oncogene transcripts , two-thirds of which have ATG codons preceding 
the start of the major open reading frame. The leader sequences of most ver- 
tebrate mRHA3 fall in the size range of 20 to 100 nucleotides . The signifi- 
cance of shorter and longer 5 ' -noncoding sequences is discussed. 



INTRODUCTION 

To search for signals that might influence early steps in translation, 
I have scrutinized the 5' -noncoding sequences of 699 vertebrate mRNAs, which 
are identified in the Appendix. The survey included all sequences to which I 
had access in the published literature except those in which the functional 
initiator codon had not been clearly identified or where it seemed possible 
that the cloned cDNA sequence fell short of the true initiator codon. To mini- 
mize redundancy, I did not enter every available sequence for large multigene 
families (especially globins, his tones and immunoglobulins), but the sequences 
that were omitted were usually similar to the ones that were entered; two cases 
where that is not true are described in footnotes fe and n. When a particular 
gene was sequenced from more than one organism, I entered both sequences if 
they differed in at least two positions near the ATG codon. otherwise I en- 
tered only one— the one for which more accessory information was available or , 
arbitrarily, the human sequence.- All mRNA sequences are written with T in 
place of U since nearly all of the sequences were determined by analyzing DMA. 



RESULTS AND DISCUSSION 
Context 

Previous surveys of euxaryotic mRNA sequences (1, 2) revealed that the 
sequence flanking functional initiator codons is nonrandoms CC&CCATGG was 
proposed as the consensus sequence for initiation of translation in higher 
eukaryotes. The present survey confirms and extends that conclusion using a 
larger and more diversified data base. 

Table 1 and Fig. 1 show a distinctive pattern over the 12 nucleotide stretch 
preceding the ATG initiator codon. The whole region is deficient in T resi- - 
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Table 1. Frequency of A, C, G and T around the translational start site in 
vertebrate mRNAs. 



POSITION: 


-12 


-11 


-10 


-9 


-8 


•1 


-6 


-5 


-4 


-3 


-2 


-1 


+4 


percent A 


23 


26 


25 


23 


19 


23 


17 


18 


25 


61 


27 


15 


23 


percent C 


35 


35 


35 


26 


39 


37 


19 


39 


53 


2 


49 


55 


16 


percent . G 


23 


21 


22 


33 


23 


20 


44 


23 


15 


36 


13 


21 


46 


percent T 


19 


18 


18 


18 


19 


20 


20 


20 


7 


1 


11 


9 


15 



Data were compiled from the 699 sequences listed in the Appendix. A window 
of 12 nucleotides preceding the initiator codon is presented, as well as one 
nucleotide (position +4) following the ATG codon. The most abundant nucleo- 
tide in each position is underlined. Values that are >50% or t twice the fre- 
quency of the next most abundant nucleotide in that position are shown in 
boldface. 




Position 



Figure 1. Frequency of A, C, G and T around the ATG initiator codon in 699 
vertebrate mRNAs, which are listed in the Appendix. The dotted line across 
each panel shows the 25% value that would be expected on a random basis. 
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dues, especially in positions -1 to -4. c is the preferred nucleotide in 
every position except -3, -6 and -9. In positions -6 and -9, G is preferred. 
Position -3 shows the strongest bias: 61% of vertebrate mRNAs have an A in. 
that position, 36% have G, and only 3% have a pyrimidine. On the 3* -side 
of the ATG codon G is the preferred flanking nucleotide. Thus the expanded 
consensus sequence for initiation in vertebrate mRNAs is (GCC) GCcjfe CATGG . 
Site-directed mutagenesis experiments have confirmed the contribution of every 
nucleotide in positions -1 to -6, as well as the G in position +4 (3, 4), but the 
significance of the GCC motif in positions -9 to -7 remains to be established. 
Because mutations in positions -3 and +4 have the strongest influence on trans- 
lational efficiency, for practical purposes an initiator codon can be desig- 
nated "strong" or "weak" by considering only those positions. That view is 
supported by the data in Table 2 , in which initiator codons are grouped according 
to the nucleotides in positions -3 and +4. Among 699 vertebrate mRNAs, only 23 
have a pyrimidine 3 nucleotides upstream from the functional initiator codon, and 
17 of those "compensate" by having G in position +4. Thus only six mRNAs out of 
699 lack the preferred nucleotide in both of the crucial positions. One would 
expect those six mRNAs to be translated very inefficiently, which is not inconsist- 
ent with the fact that four of them encode hormones or lymphokines (entries 264 , 
397, 407 and 650). The other two (entries 172 and 281) are members of large 
multigene families, for which reason one cannot assess the extent to which the 
particular mRNA that corresponds to the cloned cDNA is translationally active. 

Considerable evidence (3) supports the idea that the ATG codon and flank- 
ing sequences function as a "stop signal" for the migrating 4 OS ribosomal sub- 
unit, which binds directly only at the 5* -end of ' the mRNA. Consequently, posi- 
tion as well as context determines which ATG is the functional initiator codon. 
In cases where there are two ATG codons in equally favorable contexts near the 
5 '-end of a message, it would be incorrect to conclude that either ATG is equal- 
ly likely to initiate translation. Theory predicts and experiments confirm (54) 
that ri bo somes initiate exclusively at the 5* -proximal ATG codon when it lies 
in a favorable context. 

The repetition of G in positions -3, -6 and -9 is quite noticeable in 



. Sequences flanking ATG codons 
in vertebrate mRNAs 



Sequence 


Functional 

initiator 

codons 


"Nonfunctional".. 
UPSTREAM a 
ATG codons 


-3 


+4 






AnnATGG 


175 


4 


A.... 


A 


114 


5 


A.... 


.C 


63 


8 


A.... 


.T 


73 


4 






130 


8 


G, , , , 


.A 


47 


7 


G.... 


.C 


47 


5 






27 


5 


C... 


.G 


9 


7 






2 


8 


Cm. 


.Y 


4 


12 


T..., 


,G 


a 


4 






0 


13 






0 


16 


Total 


# 


699 


.106 ATGs . 
in 59 mRNAs ' 



"Nonfunctional" is a provisional desig- 
nation. Upstream ATG codons are expected 
to function— an expectation that has been 
verified with some viral mRNAs but not 
yet with cellular mRNAs. The more import- 
ant point is that the indicated upstream 
ATG codons are not absolute barriers to 
initiating downstream t a downstream ATG 
codon starts the major open reading frame 
in these mRNAs. 

^The tabulation of upstream ATG codons 
does not include the 34 oncogenes listed 
in the Appendix, since they comprise a 
separate group vis-a-vis the frequency 
of upstream ATG codons (see text) , 

Shirty of the upstream ATG codons in this 
set derive from just four mRNAs: entries 
73, 283, 556 and 599. Excepting those four . 
entries and the proto-oncogenes , only 9% 
of vertebrate mRNAs have upstream ATG codons 
and they typically have only one. 
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Table 3. Length distribution of vertebrate mRNA leader sequences. 



Length t <10 


10-19 


20-29 


30-39 


40-49 50-59 


60-69 


70-79 


nucleotides 


if Of mRNAs : 4 


10 


23 


29 


36 38 


37 


40 




Lengtht 80-89 


90-99 


100-199 


200-299 


300-399 400-499 


500-599 


>600 


nucleotides 


S of mRNAs: 15 


22 


68 


19 


1 3 


2 


2 





The table is based on 346 mRNAs for which the transcriptional start site has been 
mapped. In the case of genes that produce multiple transcripts, only the longest 
leader was scored. In three cases where ribosoraes initiate at the first and sec- 
ond ATG codons (see footnote £ in the Appendix), two values were entered in the 
table) in all three cases there are fewer than 10 nucleotides between the cap and 
the first functional ATG codon. 



Fig. !, Trifonov (5) has pointed out that there is a strong preference for G in 
the first position of codons throughout the coding region of both prokaryotic 
and eukaryotic mRNAs, and "he postulates that the periodicity of G residues 
helps ri bo somes stay in frame during translation. An interesting possibility 
is that "frame monitoring" begins shortly upstream from the initiator codon in 
eukaryotes. In support of that idea, it has been shown by site-directed muta- 
genesis that correct initiation is strongly favored by placing a purine in 
position -3 and G in position -6, but the facilitating effect is completely 
lost when the purines are shifted one nucleotide to the left or right (3„ 4) . 

In recent surveys of mRNA sequences from plants (6) , Drosophila (7) and 
yeast (8) , the most striking finding was conservation of a purine— usually A-- 
in position -3. The reported values for position -3 were 53% A and 231 G in 
47 plant mRNAs ; 82% A and 13% G in 77 Drosophila mRNAs; and Bl% A in 96 yeast 
mRNAs. Although such data encourage the idea that A or G in position -3 some- 
how favors initiation in all eukaryotic systems, the effect of context on ini- 
tiation has yet to be tested experimentally in nonvertebrates. The overall 
A-richness of leader sequences in yeast and plant mRNAs somewhat diminishes 
the statistical significance of finding A most often in position -3. Leader 
sequences on Dictyostelium mRNAs are also notoriously A+T rich. This might be 
a hint that ribo somes from lower eukaryotes and plants are less able to deal 
with secondary structure than are metazoan ribo somes (9) . 
Upstream ATG codons 

Three points about the occurrence of upstream ATG codons merit comment. 

(i) They are relatively rare in vertebrate mRNAs-at-large, as indicated in 
Table 2. The raw data from which Table 2 was compiled are given in the Ap- 
pendix. 

(ii) The big exception to the foregoing generalisation are proto-oncogenes 
(entries 454 to 487), nearly two -thirds of which produce mRNAs that have ATG 
codons — usually more than one — preceding the start of the major open reading 
frame. In view of the inhibitory effect of upstream ATG codons (3) , it is 
probably not an accident that activation of proto-oncogenes by transduction or 
translocation often deletes the cumbersome leader sequence . Preliminary evi- 
dence (10, 11) encourages the hypothesis that the expression of some oncogenes 
is regulated in part at the level of translation. 

(iii) The context around "nonfunctional" upstream ATG codons differs striking- 
ly from the functional initiator codons listed in Table 2. Whereas functional 
initiator codons are rarely preceded by a pyrimidine in position -3, upstream 
ATG codons often occur in that unfavorable context. The notion that scanning 
is " leaky" in such mRNAs — with some 4 OS ribosomal subunits stopping and initi- 
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a ting at the upstream, site while some reach the second ATG codon—is supported 
by some experimental evidence (3, 12), 

Leaky scanning obviously cannot aocount for the ability of ri bo some 3 to 
initiate downstream from a strong ATG codon, and a considerable number of the 
upstream ATG codons in Table 2 do occur in a favorable context. Those ATG 
codon s are nearly always followed by a terminator codon, however, and it seems 
likely that — after translating the upstream "minicistron" and terminating — 
ri bo some s reinitiate at the next ATG codon downstream (12, 13 and references 
therein) . Given leaky scanning and reinitiation as devices with some experi- 
mental justification that permit initiation at an ATG codon that is not "first," 
an upstream ATG codon should pose a problem only if it occurs in a favorable 
context for initiation and is not followed by a terminator codon before the 
start of the major open reading frame. Among the 699 mRNAs listed in the Ap- 
pendix, only five have that problematical structure; they are entry 120 (where 
the upstream ATG codon lies very close to the cap and hence might be ineffici- 
ent—see below)? entries 520, 553 and 599 (where the potentially- inhibitory 
ATG codon is preceded by a far-upstream minicistron which might have a sparing 
effect, as explained in reference 13); and entry 247, for which I have no ex- 
cuse. 

It might be noted parenthetically that upstream ATG codons seem to occur 
more commonly in Drosophila than in vertebrate mRNAs, although I cannot cite 
precise statistics for Drosophila, nor is it always certain that the ATG- 
burdened leader sequence belongs to a functional form of mRNA (14) . 
Leader length 

The precise length of the S'-noncoding sequence is known for about half 
of the entries in the Appendix, and those mRNAs were used to compile the data 
in Table 3. Only one-fourth of the mRNAs that were scored have a leader se- 
quence longer than 100 nucleotides. Thus the leader sequences on most vertebrate 
mRNAs fall in the range of 20 to 100 nucleotides . Note that the mRNAs derived 
from pro to -oncogenes are again atypical, as nearly all of them have very long 
leader sequences. Also note some extraordinarily long leader sequences (400 
to >800 nucleotides) that contain no upstream ATG codons; see entries 523, 524 
and 650. 

The effects of leader length on translational efficiency are just begin- 
ning to be explored. There is some evidence that an ATG codon is not recog- 
nized efficiently when it occurs close to (within 10 nucleotides of) the cap; 
that might explain the rare examples of cellular mRNAs in which ribosomes ini- 
tiate at the first and second ATG codons (see entries 143, 297, 330 and foot- 
note g in the Appendix) . A few viral mRNAs that seem to translate efficiently 
have leader sequences only 9 or 10 nucleotides long (15-17) , but the possibil- 
ity that some ribosomes reach the second ATG in those mRNAs has not been ruled 
out. in the case of SV40 late 16s mRNA, the ATG codon that initiates the agno- 
protein occurs 10 nucleotides down from the cap and it is clearly recognized 
inefficiently, despite .a favorable context; lengthening the leader sequence 
by 33 nucleotides seems to improve initiation at the agnoproteln start site, 
with the result that fewer ribosomes reach the downstream VPl start site (18). 
In the case of vaccinia virus, a novel transcriptional maneuver (19) adjusts 
both the length and the context in a way that favors the efficient translation 
of late mRNAs. 

Whereas a leader sequence that is too short might- be deleterious , there 
is no evidence that long leader sequences are incompatible with efficient trans- 
lation provided that Inhibitory features (notably secondary ..structure and up- 
stream ATG codons) are avoided. Many of the naturally occurring, long 5' -non- 
coding sequences are G+O rich, however, and therefore secondary structure might 
negate any advantage of length. Prom- an opposite perspective, the presence of 
secondary structure in GC-rich leader sequences might necessitate that they be 
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long, since length seems to overcome the Inhibitory effect of secondary struc- 
ture in some experimental constructs (M.K, , unpublished data) . 
Errors of note 

By comparing two independently derived cDNA, sequences for a particular mRNA 
or by comparing a cDNA .with the corresponding genomic sequence , one can spot 
certain types of errors. The mistakes encountered most frequently when anal- 
yzing 5'-noncoding sequences merit comment. 

(t) cDNA sequences sometimes correspond, not to the functional mRNA, but to a 
partially processed precursor that retains an intron in the 5'-noncoding se- 
quence (20-22). Several of the long, ATG-burdened 5 '-sequences that have ap- 
peared in the literature represent introns that are not present in the mature 
functional mRNA (23-25). The abundance of upstream ATG codons in the mRNA 
that encodes a 70K protein associated with Ul RNA (entry 556 in the Appendix) 
raises the interesting possibility that that cDNA corresponds to an incomplete- 
ly spliced transcript, and that the splicing machinery — of which Ul shurps are 
a part — is itself regulated by the efficiency of splicing, with other genes 
there is indeed experimental evidence for regulation at the level of retaining 
or removing a S'-intron (14). 

(ii) cDNA sequences sometimes correspond to minor mRNA species that have unusu- 
ally long (sometimes ATG -afflicted) leader sequences. There are many examples 
in which SI nuclease mapping has revealed the bulk mRNA population to have a 
shorter leader sequence than the longest cDNA (26-32). It is reassuring, there- 
fore, when steps are taken to show that an unusually long leader sequence is 
really representative of the mRNA population (33, 34) . 

(lii) Even with Si mapping, the major transcriptional start site has sometimes 
been misidentified. For example, the more abundant leader on mouse dihydro- 
folate reductase mRNA was missed because its shorter length and lower GC con- 
tent made it less stable (as a DNA-RNA hybrid) than a minor, long leader se- 
quence (35) . 

(iv) The primer extension technique is error prone, resulting in frequent mis- 
takes in the deduction of 5'-noncoding sequences (36-41 j compare 42 with 43 [albu- 
min); 44 with 45 (ferritin); 46 with 47 [parathyroid hormone]; 48 with 49 [pyruvate kin- 
ase); 50 with 51 [IGF-II]; and 52 with 53 [X-CGD]. With perverse consistency, such 
cloning errors generate upstream ATG codons that are not really present in the mRNA. 



ACKNOWLEDGEMENT 

Research funds were provided by a grant from the National Institutes of 
Health (GM33915) . I thank the staff of Langley Library for their help. 

REFERENCES 

1. Kozak, M. (1981) Nucl. Acids Res. 9, 5233-5252. 

2. Kozak, M. (1984) Nucl. Acids Res. 12, 857-872. 

3. Kozak, M. (1986) Cell 44, 283-292. 

4. Kozak, H. (1987) J. Mol. Biol. 196, 947-950. 

5. Trifonov, E.N. (1987) J. Mol. Biol. 194, 643-652. 

6. Heidecker, G. and Messing, J. (1986) Ann. Rev. Plant Physiol. 37, 439-466. 

7. Cavener, D.R. (1987) Nucl. Acids Res. 15, 1353-1361. 

8. Hamilton, R. , Watanabe, C.K. and de Boer, H.A. (1987) Nucl. Acids Res. 15, 
3581-3593. 

9. Kozak, M. (1986) Proc. Natl, Acad. Sci. USA 83, 2850-2854. 

10. Propst, F. r Rosenberg, M. , Iyer, A. , Kaul, K. and Vande Woude, G.F. (1987) 
Mol. Cell, Biol. 7, 1629-1637. 

11. Ratner, L. , Thielan, B. and Collins, T. (1987) Nucl. Acids Res. 15, 6017-6036. 

12. Kozak, M. (1986) Cell 47, 481-483. 

13. Kozak, M. (1987) Mol. Cell. Biol. 7, in press. 



6130 



192 



Nucleic Acids Research 



14. Gaul, U. , Self ert, E. ,. Schuh, R. and Jackie, H. (1987) Cell 50, 639-647. . 

15. Dasgupta, R. , shih, D., Saris, C. and Kaesberg, P. (1975) Nature 256, 624-628. 

16. Rose, J.K.. {1980} Cell. 19, 415-421. 

17. Collins, P.L. and Wertz, G.W. (1985) J. Virol. 54, 65-71. 

18. Grass,' D.S. and Hanley, J.L. (1987) J. Virol. 61, 2331-2335. 

19. Schwer, B., Visca, P., Vos, J. and Stunnenberg, H. (1987) Cell 50, 163-169. 

20. McPhaul, M. and Berg, P. (1987) Mol. Cell. Biol. 7, 1841-1847. 

21. Larhammar, D. , Hammerling, u., Rafik. L. , and Peterson, P. (1985) J. Biol. 
Chem. 260, 14111-14119. 

22. Ueda, K. , Clark, D. , Chen, C. , Roninson, I., Gottesnian, M.M. and Pastan, I. 
(1987) J. Biol. Chem. 262, 505-508. 

23. Wells, D. and Kedes, L. (1985) Proc. Natl. Acad. Sci. USA 82, 2834-2838. 

24. Li, S., Tiano, H., Fukasawa, K. , Yagi, K. , Shimizu, M. , Sharief, F. , Naka- 
shima, V. and Pan, Y.E. (1985) Bur. J. Biochem. 149, . 215-225. 

25. Fukasawa, K.M. and Li, S. (1986) Biochem. J. 235, 435-439. 

26. Tauchiya, M, , Kaziro, Y. and Nagata, S. (1987) Eur. J. Biochem. 165, 7-12. 

27. Shahan, K., Gilmartin, M. and Derman, E. (1987) Mol. Cell . Biol. 7, 1938^1946. 

28. Rixon, M., Chung, D. and Davie, E. (1985) Biochemistry 24 , 2077-2086. 

29. Persico, M.G., Viglietto, G., Martini, G., Toniolo, D., Paonessa, G., Moscatelli, 
C., Bono, R., Vulliamy, T. r Luzzatto, L. andD'Urso, H. (1986) NAR 14, 2511-2522. 

30. Kobilka, B„ Frielle, T„ Dohlman, H., Bolanowski, M., Dixon, R., Keller, P., 
Caron,M, and Lefkowitz, R.J. (1987) J. Biol. Chem. 262, 7321-7327. 

31. Akeson, A., Wig in ton, D. , States, J.C., Pertne, C. , Dusing, M. and Hutton, 
J.J. (1987) Proc. Natl. Acad. Sci. USA 84,. 5947-5951. 

32. Dente, X,., Pizza, M.G., Metspalu, A. and Cortese, R. (1987) EMB0 J. 6, 2289-2296. 

33. Conboy,J., Kan, Y.W., Shohet, S, and Mohandas, N. (1986) Proc. Natl. Acad. Sci. 
USA 83, 9512-9516. 

34. Peralta, E. , Winslow, J., Peterson, G. , Smith, D. , Ashkenazi, A., Ramachan- 
dran, J., Schimerlik, H. and Capon, D.J. (1987) Science 236, 600-605. 

35. Sazer, S. and Schimke, R.T. (1986) J. Biol. Cham. 261, 4685-4690. 

36. Ruppert, S., Scherer, G. and Schtitz, G. (1984) Nature 308, 554-557. 

37. Auron, P., Webb, A., Rosenwasser, L. , Mucci, S., Rich, A., Wolff, S.M. and 
Dinarello, C.A. (1984) Proc. Natl. Acad. Sci. USA 81, 7907-7911. 

38. Ann, T.G. , Cohn, D.V., Gorr, S.U., Ornstein, D.L., Kashdan, M.A. and Levine, 
M.A. (1987) proc. Natl.- Acad. Sci. USA 84, 5043-5047. 

39. Hall, L. , Craig, R.K. , Edbrooke, M.R. and Campbell, P.N. (1982) Nucl. Acids 
Res. 10, 3503-3515. 

40. Claesson, L. , Larhammar, D. , Rask, L. and Peterson, P. A. (1983) Proc. Natl. 
Acad. Sci. USA 80, 7395-7399.* 

41. Daddona, P.E., Shewach, D.S., Kelley, W.N., Argos, P. , .Markham, A.F. and 
Orkin, S.H. (1984) J. Biol. Chem. 259, 12101-12106. 

42. Lawn, R.H. , Adelman, J. , Bock, S.C., Franks, A.E., Houck, CM., Najarian, 
R.C., Seeburg, P.H. and Wion, K.L. (1981) Nucl. Acids Res. 9, 6103-6114. 

43. Minghetti, P., Ruffner, D. , Kuang, W-J. , Dennison, O. , Hawkins, J., Beat tie, 
W.G. and Dugaiczyk, A. (1986) J. Biol. Chem. 261, 6747-6757. 

44. Dorner, M.H., Salfeld, J., Will, H. , Leibold, E. A. , Vass, J.K. and Munro, 
H.N. (1985) Proc. Natl. Acad. Sci. USA 82, 3139-3143. 

45. Santoro, C, Marone, M., Ferrone, M. , Costanzo, F. , Colombo, M. , Minganti, 
C, Cortese, R. and Silengo, L. (1986) Nucl. Acids Res. 14, 2863-2876. 

46. Kronenberg, H.M. , McDevitt, B., Majzoub, J., Nathans, J., Sharp, P. A., 
Potts, J.T., Jr. and Rich, A. (1979) Proc. Natl. Acac, Sci. USA 76, 4981- 
4985. 

47. Weaver, C.A. , Gordon, D.F. ana Kemper, B. (1982) Mol. Cell. Endocrinology 
28, 411-424. 

48. Inoue, H., Noguchi, T. and Tanaka, T. (1986) Eur. J. Biochem. 154 , 465-469. 

49. Cognet, M., Lone, Y.C. , Vaulont, S. , Kahn, A. and Marie, J. (1987) J. Mol. 
Biol. 196, 11-25. 



8131 



193 



Nucleic Acids Research 



50. Soares, M.B., Ishii, D.N. and Efstratiadis, A. (1985) Nucl. Acids Res. 13 , . 
1119-1134. 

51. Soares, M.B., Turken, A. , Ishii, D., Kills, L. , Episkopou, V., Cotter/ S., 
Zeitlin, S. and Efstratiadis, A. (1986) J. Mol. Biol. 192, 737-752. 

52. Royer-Pokora, B., Kunkel, L. , Monaco, A., Goff, S., Newburger, P., Baehner, 
R., Cole, P.S., Cumutte, J.T. and Orkin, S.H. (1986) Nature 322 , 32-38 . 

53. Orkin, S.H. (1987) Trends in Genetics 3, 207. 

54. Kozak, M. (1983) Cell 34, 971-978. 



8132 



194 



Nucleic Acids Research 



APPENDIX 

Entry Leader Sequence flanking Upstream ATGs C 

Ho. Messenger RNA/ source* 1 length" initiator codon v w/t a s/t References* 















001 


a nicotinic, hu muscle 




ctceggtagcccATGg 




Node '63 Nature 305,818 


002 


a nicotinic , rat neural 


225 


cgggtcttagacATGg 




Boulter '86 Nat 319,368 


003 


S nicotinic > bo muscle 








Tanabe '84 EJB 144,11 


004 


Y nicotinic, hu muscle 




agctgaggcaccATGc 




Shibahara '85 EJB 146, 15 


005 


e nicotinic, bo muscle 




ccag acagcggaATGg 




Takai '85 Nat 315, 761 


006 ^ 




270, 400 




3 


Pa rait a '87 Scl 236 600 


007 


cerebral muscarinic , po 


>444 


ooacccagcaccATGa 


1 2 


1 Kubo '86 Nature 323,411 


008 


dl acid glycoprotein, hu 




cctggtotcagtATGg 




Dente '87 EKB0 6, 22B9 


009 


acid glycoprotein, rat 


40 


agtgtcttcggcATGg 




Liao '85 HCB 5,3634 




Actina 










010 


a-skeletal, mu 


70 * 


aaact&gacaccATGt 




Hu '86 HCB 6,15 


Oil 


.a-skeletal, ch 


73 * 


acagccagcaacATCt 




Fornwald '82 NAR 10,3861 


012 


o-skeletal, Xp 


>60 


ccagcctcaaacATGt 




Stutz '86 JMB 167,349 


013 


a-cardiac, Xp 


>53 * 








014 


a-cardiac, ch 


60 • 


ctatcagccaagATGt 




Chang '85 NAR 13,1223 


015 


a-smooth muscle, hu 




gcagctccagctATGt 




Ueyama '84 HCB 4,1073 


016 


a-smooth muscle, ch 


88 • 






Carroll '86 JBC 261,8965 


017 


0-cytoplasmic, hu 


84 * 


cgccagctcaccATGg 




Ng 'B5 HCB 5,2720 


018 


^-cytoplasmic , rat 


80 * 


caccagttcgccATGg 




Nudel '83 NAR 11,1759 


old 


0-cytoplasmic, ch 


96 * 


ccacagccag ccATGg 






020 


Y-cy topi amnio , hu 


>73 


etgccggtcgcaATGg 




Erba '86 NAR 14,5275 


021 


3rd oytopl isoform, ch 


49 * 


gcaggcccaatcATGg 




Bergsma '85 HCB 5,1151 


022 


APRT, hu 




tcttcgcacgccATGg 




Broderick '67 PNAS 84,3349 


023 


APRT, mu 


60 


acgcacgcggccATCt 




Dush '85 PNAS 82,2731 


024 


Adenosine deaminase, hu 


95 


cacgagggcaccATGg 




Ingolia '86 KCB 6,445B 


025 


Adenosine deaminase, mu 


•W135 


acgctoggaaccATGg 






026 


AdoHcy hydrolase, rat 




gacttcgccagcATGg 




Ogawa '87 PNAS 84,719 


027 


Adenylate kinase, ch 


>57 


cacagcagcagcATGt 




Klshi '86 JBC 261,2942 


028 


Adipocyte P2 , mu 


67 


aaggtttacaaaATGt 




Hunt '86 PNAS 83,3786 


029 


Adrenodoxin, bo 


>164 


ccccgacaggctATGg 




Okaraura * 85 PNAS 82,5705 


030 


Albumin, hu serum 


39 


gcctttggcacaATCa 




Hinghetti 'B6JBC 261, 6747 


031 


Albumin, ch serum 


41 


taatctgcagccATGa 




Hachfi 1 83 JBC 258,4556 


032 


ADH, u subunit, hu class 1 


gacagaatcaacATGa 




Ikuta '86 PNAS 83,634 


033 


ADH, 6 subunit, hu M 


70 


gacagaaacgacATCa 




Duester '86 JBC 261,2027 


034 


ADH, y subunit, hu " 


>eo 


gacagaatcaatATGa 


1 


Hoog 'B6 EJB 159,215 


035 


ADH-AA, mu liver 


>101 


aggacagacggcATCa 




1 Edenberg 1 85 PNAS 82 , 2262 


036 


ALDH, hu liver 


>30 


tagcccgctgcgATCt 




Braun '87 NAR 15,3179 


037 


Aldolase A, rat 


66, 116* 


gccaccggcaccATGc 




Joh '86 JMB 190,401 


038 


Aldolase B, rat 


Bl * 


gtacctgtcatcATCg 




Tsutsumi '85 JMB 181,153 


039 


Aldolase B, ch 


72 * 


caataagtcaccATGa 




Burgess '85 JBC 260,4604 


040 


Aldolase c, mu 


>60 


aeaactgteatcATCc 




Paolella '86 EJB 156,229 


041 


ALP, hu intestinal 




tqoccccaagaoATCc 




Henthorn ' 87 PNAS B4.1234 


042 


ALP, hu liver/bone 


>176 


ttggggtgcaccATGa 




Weiss '86 PNAS 83,7182 


043 


ALA-D, hu 


>66 


ctggeecaegccATCc 


1 


Wetmur 'B6 PKAS 63,7703 


044 


ALA-D, rat 


>48 


ccggoceceaecATCc 




Bishop '86 KAR 14,10115 


045 


5-ALV synthase, ch 


81 


gcaggaggaaggATGg 




Haguire * 86 NAR 14,1379 


046 


a -amy las o , hu salivary 


220 * 


cttcaaagcaaaATGa 




Nishide '86 Gene 41,299 


047 


a-amylase, mu salivary 


95 * 


cagcatagcaaaATGa 


1 


HagenbOchle'81 Nat 289,643 
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048 


a-amylasa, rau pancreatic 


17 


cttcaaagcaaaATGa 






Hagenbuchle*80C«ll 21,179 


049 


Anyloid-A4 (Alzheimer *s) 


>146 


cgeagggtegegATGc 






Kang ' 87 Nature 325,733 


050 


Amyloid (SAA2),.mu serum 


34. * 


gaeaccagcaggATGa 






Lowell '86 JBC 261,8442 


051 


Androgen-BP, 45K, rat 


>33 


cagctgctaactATGg 






Joseph '87 PHAS 84,339 


052 


And. -induced RP2,nu 


>40 


aggacgccggc c ATGa 






King '86 KAR 14,5159 


053 


And-induced S-protein, rat 


>51 


ttttctggcaagATCa 






McDonald '84 EHBO 3,2517 


054 


Angiotensinogen, rat 


61 * 


cacagatccgtgATGa 






Tanaka "84 JBC 259,8063 


055 


Antifreeze protein » flounder 49 


caagttctcaaaATGg 






Davies '84 JBC 259,9241 


056 


" ooean pout 


>S7 


tcagccacagccATGa 






Li 'BS JBC 260,12904 


057 


Arginase, hu liver 


>56 


aagtgtcanagcATGa 






Haraguchi '87 PNAS 84, 412 


osa 


Arginase, rat liver 


>26 


ccctggatgagcATGa 


1 




Kawamoto '87 JBC 262, 6280 


059 


Arginosuccinate lyase hu >114 


gaaccgcccaacATGg 






O'Brien ' 86 PNAS 83,7211 


060 


Arginosucc. synthase, hu 


>75 


atcQcagacgctATGt 






Bock '83 MAR 11, 6505 


061 


AspAT, ntu mitochondrial 


>88 


ttaccgcccaccATGg 






Obaru '86 JBC 261,16976 


062 


AspAT, mu cytoplasmic 


>54 


cattctgtcgcgATGg 






- „ 


063 


AspAT, ch mitochondrial 




cacgctgccgccATGg 






Jaussi '85 JBC 260,16060 


064 


ATP/ADP carrier, hu 


>70 


ccttctttcaacATGa 






Battini '87 JBC 262, 4355 


065 


ATPase-Ca 2+ ra slow twch 


>129 


gcccccgcagccATGg 






KacLennan'85 Sat 316,696 


066 


ATPase-Ca 2+ ra fast twch 


>81 


gaagggagcgcaATGg 






Brandl '66 Cell 44,597 




ATPace (Na+X + ) -a r ra t brain >237 


agcg c eg cca c c ATGg 








068 


" alii • 


>140 


ggagccgccaagATGg 








069 


• " 0, human 


>120 


cccgccatcgccATGg 




2 


Kawakarai 1 86 HAR 14,2833 


070 


" 8 rat kidny >460 


tgagcagacaccATGg 






Young '87 JBC 262,4905 


071 


• a sh kidney >264 


accaccgccgctATGg 






Shull '85 Nature 316, 691 


072 


" " 8 sh kidney >528 


tgacccgccaccATGg 






Shull '86 Nature 321, 429 


073* 


" (H+K + ), rat stomach 


>206 


cacctagccaccATGg 




5 


4 Shull ' 86 JBC 261 , 16788 


074 


ATPase, Xp mitochondrial 




caagccgcagtcATGt 






Weeks '87 PNAS 84,2798 




Atrial natriuretic factor 


90 


acccacgccagcATGg 






Argentin '85 JBC 260, 4568 


076 


Avidin, ch 


>43 


cctgctgcagagATGg 






Gope '87 NAR 15, 3595 


077 


BPCH, hu erythrocyte 


>U0 


- tcagccatcagtATGt 




1 


Joulin '8&EMB0 5, 2275 


07 B 


Bone Gla protein, rat 


>48 


ctagcagacaccATCa 






Celeste '86EKB0 5, 1885 


079 


Brain Si 00 -a protein, bo 


>89 


gtaagcttcgagATGg 






Xuwano 'B6FEBS 202, 97 


OSO 


Brain SlOO-fi protein .rat >120 


ggagcctccgggATG t 






Xuwano '84 KAR 12, 7455 


081 


Brain apcif. gene 0-44, rat 66 + 


taggcegccgagATCg 


+ 




Tsou '86 KCB 6, 768 


082 


C-raaetiva protein, hu 


104 


caggacgtgaccATCg 






Lei '85 JBC 260, 13377 


083 


Caerulein, Xp 


>50 


ccttctgaaagcATCt 






Richter ' 86 JBC 261 , 3676 


084 


Calcitonin, hu 


>7« 


cagagagg tg tcATCg 






Jonas '85 PHAS 82, 1994 


085 


Calcitonin, rat 


132 


cagggaggcatcATGg 






Amara 1 84 HCB 4 , 2151 


086 


Ca 2 ^binding protein, ch 


>102 


tgoacococaaoATGa 






Runsiker '86 PNAS 83, 7578 


087 


Ca 2+ " rat brain 


>130 


agocgctgcaccATOg 






YamaVuni ' 86 NAR 14 , 6768 




Calraod. family Ca 2+ binding proteins t 








OBB 


Calmodulin, rat 


85 


ttcgctcgcaccAVQg 






Hojima '87 JMB 193, 439 


0fl9 


Calmodulin pRCH3, rat 


>70 


ageccttgcagcATGg 






Hojima '87 KCB 7, 1873 


090 


Calmodulin, ch 


91 


ggccgagccaccATGg 






Putkey '83 JBC 258, 11864 


091 


Calmodulin, Xp 


>70 


aactattccgaaATGg 






Chien *B4 HCB 4, 507 


092 


Oncostodulin, rat 


97 


gcgggacagaaaATCa 






Gill en '87 JBC 262, 5308 


093 


Parvalbunin, rat 


73 • 


ceaagttgcaggATGt 






Epstein '86 JBC 261, 5886 


094 


Calpain, po 


>90 


tgagtcacagccATGt 


1 




Sakihana 'BS PNAS 82, 6075 



095 Calcyclln (SlOO-related) hu 65 * . cagccctcagccATCg Ferrari *87 JBC262, 6325 
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096~~ 


Carbamyl-p-synthetase rat 


140 


aacatcttcaa aATCa 










Carbonic anhydrase X, hu 




c a g taga ag a t aATGg 






Barlow 'B7HAR15, 2386 




Carbonic anhydrase II , ntu 


60 


a cc gg egtgaccATC t 






vent a 85 jbc zou, 12130 


099 


Carbonic anhydrase III, hu 


>43 


agg aagg eg ac cATGg 






Liioya 00 bene 4±, <jj 


100 


Carbonic anhydrase II, ch 


39 


. ggccggcgcaccATGt 






Yoshlhara '87 MAR 15, 753 


101 


Cartilage-link protein ch >135 


gtgactgtgaagXTCa 






Deak '86 PKAS 83, 3766 


102 


o -Casein, rat 


62 * 


atcttagcaaccATCa 






Yu-Lee, B6HAR14, 1883 


103 


B-Caseln, rat 


52 * 


gacttgacagccATCa 






Jones '85 JBC260, 7042 


104 


6-Casein, mu 


>55 


gacttgeeagccATGa 






YoshiRiura ' 86 HAH 14 , 8224 


105 


y-Casein, rat 


56 * 


gatcaagtaaccATGa 






Hobbe '82 HARlO, 8079 


106 


Catalase, hu kidney 


70 


aaaccgcacgctATCg 






Ojuan 'B6NAR14, 5321 


107 


Catalase, rat liver 


>B3 


caatcctacaccATGg 






Furuta ' 66 PHAS 83, 313 




Cathepsin B, hu 


>195 


c tgg ct tc ca ac ATGt 


1 




Chan '86 PHAS 63, 7721 


109 


Cathepsin D, hu 


>51 


gccgccgccgccATGc 






Faust '65 PHAS 82, 4910 


110 


Cholecystoklnin, hu 


64 • 


aatccaaaagccATGa 






Takahashi *B6Gene 50, 353 


111 


Cholecystoklnin, rat 


59 * 


geatccgaagatATCa 






Oeschenes '65 JBC 260, 1280 


112 


Chromogranln, bo 


180 


cccggcttcgceATGo 






Benedun '86 EHBO 5, 149S 


115 


proChymosin, bo 


25 


cccagatccaagATGa 






Hidaka '86 Gene 43, 197 


in 


CJiyrootrypslngen-B, rat 


22 


ttgaccagcaccATGg 






Bell 'B4 JBC 259, 14265 




ooaqulation factors c modulators i 


see also Fibrinogen 


, Inhibitors, 


PA, Thromboapondin 


115 


Factor VII, hu 


>35 


agagatttcatcATGg 






Hagen '66 PHAS 83, 2412 


116 


Factor VIII, hu 


170 


tagcaataagtcATGc 




1 


Gltschier 'B4 Hat 312,326 


117 


Factor X, bo 


>75 


aagggceccaccATGg 






Fung '84 HAR12, 4481 


118 


Factor XIX X-a, hu placenta >B4 


gtaaagtcaaaaATGt 






Grundmann'86 PHAS 63,8024 


119 


vonWlllebrand factor, hu 


246 


ttgcaggggaagATGa 


1 


1 


Verve! j '66 EHBO 5 ,1839 


120 


protein C, hu 


75 * 


agtgcctccagaATGt 




1 


Plutzky * B6 PHAS 83 , 546 


121 


Protein S, hu 


>108 


cgcgcettcgaaATGa 






Koskins '87 PHAS 84, 349 


122 


Protein S, bo 


>35 


gcccgtttcgccATGa 






DahlbSck 'B6 PHAS 83 , 4199 


123 


procollagen, a2(X},ch 


133 


tagcaagtagacATCc 


2 




Vogell '81 PHAS 78, 5334 


124 


procollagen, al{I),ch 


MOO 


taatatttagaoATGt 


2 






125 


procollagen, al(XX),rat 


155 


tcgcggtgagccATGa 


1 




Kohno *B5 JBC 260, 4441 


126 


collagenase, hu skin 


>68 


acaaaggccagtATGc 






Goldberg '86 JBC 261,6600 


127 


Complement components G modulators t 








preClr, hu 


>51 


gggccttgagaaATGt 






Journet '66 Bio, J. 240,783 


12S 


preC2, hu 


>36 


agggaggacaccATGg 






Bentley '86 Bio. J, 239,339 


129 


preC3, hu 


>60 


tgtcccagcaccATGg 






deBruijn *B5 PHAS 62,708 


130 


preC3, mu 


56 


ttttccttcactATGg 






Hiebauer '82 PHAS 79,7077 


131 


preC4, mu 


61 


gatcctecagcoATGc 






Honaka '66 PHAS 83,1883 


132 


Factor B, hu 


129 


ccttccaacgccATGg 


1 




HU *87 Cell 48,331 


133 


Factor 1, hu 




eacacctccaacATGa 






Catterall'67 BioJ 242,849 


134 


Decay-accelerate factor hu >66 


acccggcgcgccATGa 






Caras '87 Mature 325, 545 


135 


a inhibitor, hu 




gtcgccgcccagATOg 






Bock '86 Biochem 25, 4292 


136 


Conalbumin, ch 


76 


ccctgccccaacATGa 






JeltSCh * 82 BJB 122,291 


137 


Corticotropin-RF, sh 


>127 * 


gcg ccccc taacATGc 






Furutani '63 Hature 301,537 


138 


Creatine kinase, rat brain 


>29 


gccgccgccgccAOGc 






Benfield '85 Gene 39, 263 


139 


Creatine kinase, ra muscle 


>50 


gacgccgccaccATGc 






Putney '84 JBC 259, 14317 


140 


Creatine kinase-B, ch 


>42 


gtagggacagccATGc 






Kossle 'B6HAA14, 1449 


141 


OA-Crystallin, ha 


68 


gecaagaagaacATGg 






Heuvel '85 JMB1B5, 273 


142 


aB-Cryst&llin, ha 


43 


cacc tagccaceATGg 






Ouax-Jeuken'85PNAS 82,5819 


1439 


BA3/A1 -cry stall in, mu 


56 


ccaa ecaccaagATCg 






Peterson '86 Gene 45,139 


144 


BBl-Crystallin, ch 


>70 


c tgaccaccgcgATGt 






He j tnancik ' 8 6 JBC 26 1 , 98 2 
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145 


8B1-Crystallin, rat 


38 • 


gcatcagaaaccATGt 






Dunnen '86 PNAS 83.2855 


146 


Y-CrysLallin, rat 


42 


caaacaacagccATGg 






Kooraann ' Q3JMB171, 353 


14 7 


61-Crystallin, ch 


86 * 


aeg teg tc cgaaATGg 






Hayashi '85 EMBO 4, 2201 


148 


Cell cycle tsllgene,hu 


>78 


tgtctgagtgc t ATGa 




1 


Greco '87 PHAS 84, 1565 


149 


Cell cycle "cdc2**, hu 


>140 


cattgactaactATGg 






Lee '87 Nature 327, 31 


150 


Cyclin, hu 


>118 


cactccgccaccATGt 






Almendral '87 PHAS 84, 1575 


1S1 


Cyclophllln, hu T-cell 




g tac ta ttagccATGg 






Haendler * 87 EMBO 6, 947 


1S2 


Cystic fibrosis Ag, hu 


51 


tccgtgggcatcATGt 






Dor in '87 Nature 326, 614 


153 


Cytochrome c, mu 




ttagaattaaaaATGg 






Limbach '85HAR13, 617 


154 


Cytochrome c, ch 




ctagtac tgacaATGg 






Linbach '83HAB11, 6931 


155 


Cytochr. c oxidase -IV, bo 


32 


ggtggcatcagaATGt 






Lomax * 84 PNAS 81 , 6295 


156 


NADPH-eyto P450 reductase 70 


t tatcaccaacATCg 
g a caccaac 






Murakami '86 DMA 5, 1 




Cytochrome P-450 proteins t 










157 


Pi -4 50, human 


122 • 


cctacactgatcATGc 






Jaiswal '85 NAR 13, 4 503 


158 


P-450[jp, human 


>68 


aggaaagtagtgATCg 






Beaune ' 86 PNAS 83, 8064 


159 


P-450cl7, human 


>4l 


cacccagccaccATGt 






Chung '87 PNAS 84, 407 




P-4SOc21, human 9 


53, 118 


99 g eg tc teg c c ATCc 


+ 




nigasni oo r ri/\o qj. 


161 


P-450&CC, human 


75 


tgtggggacagcATGc 






Chung '86 PNAS 83, 8962 


162 


P-450 1, rabbit 


>43 


aggaagaagagaATGg 






Johnson 'B7 JBC 262, 5918 




P-450(K-1) , male rat liver 




gagaaggctgccATGg 






Yoshioka '87 JBC 262, 1706 


164 


P-450HC, rat 


>72 


gccacctagatcATGc 






yabusaki '84 NAR 12, 2929 


165 


P-450dbl, rat 


>51 


a g ca agg cag c c ATGg 






Gonzalez '87 DMA 6 149 


166 


P-450-PCN, rat 


>82 


a g acc tg cagg g ATGg 


1 




11 'US JBC 260 7435 


167 


P-450e, rat 


30 


taca ccaggacc ATGg 






Mizukarai '83 PNAS 80, 3958 


168 


P-450a, rat 




tcactggccactATGc 






Nagata '87 JBC 262, 2787 


169 


P-450s, ch 


>39 


cc tc tgcccaccATGg 






Hobbs ' 86 JBC 261, 9444 


170 


Cytokeratin type 1, bo 


25 


aacagca tcaccATGt 






Rieger '85 EMBO 4, 2261 


171 


Cytokeratin 19, bo 


59 


teetget t cgccATGa 






Bader '86 EMBO 5, 1865 


172 


Cytokeratin Endo A, mu 


79 


cag ac t t c a ccaATG t 






Vasssur '85 PNAS 82, 1155 


173 


Cytokeratin Endo B, nu 


>54 


tctccagacaagATGa 






Singer 1 86 JBC 261, 538 


174 


Cytokeratin-8, Xp 


>36 


cacagctccaceATGt 






Franz '86 PNAS 83, 6475 


175 


Desain, ha 


81 


cacgccgccaecATGa 






puax '85 Cell 43,327 


176 


Diazepam inhibitor, rat 


>117 


cacc tegceag tATCt 






Moechetti '86 PNAS 83, 7221 


177 


DHFR, hu 


71 


cccgctgctgtcATGg 






Chen '84 JBC 259, 3933 


178 


OKFR, nu 55, 115 


cccgctgcca tc ATGg 






Nunberg '80 Cell 19, 355 


179 


Dihydropteridine reductase >80 


ggcaggagcaggATGg 






1 Lockyer '87 PNAS 84, 3329 


180 


Elastase I, rat pancreatic 


22 


tctc tccacaac ATGc 






Mac Dona Id '82 Biochem 


181 


Blastase II, " " 


22 


cacggacacaccATCa 






21, 1453 


1B2 


Elastin, hu 


>21 


tt tctc cccgag ATGg 






InrUk '87 PNAS 84., 5680 


183 


Endopeptidase , neutral , ra > 


agattttaggtgATGg 






Davault '87 EMBO 6, 1317 


184 


Endothelial cell GF, hu 


>38 


agctgctgagccATCg 






Jaye '86 Sci 233, S41 


185 


preproEnkephalin A, hu 


130 * 


ageg tcaaetccATGg 






Nod a '83 Nature 297, 431 


186 


preproEnkephalin B, hu 




tgctgagacaggATGg 






Horikava '83 Nat 306, 611 


187 


preproEnkephalin, rat 


156 • 


accggcagccccATGg 






Rosen '84 JBC 259, 14309 


188 


Enolase, non-neuron, rat 


>110 


cagaact tcaccATGt 






Sakimura '8S NAR 13, 4 365 


189 


Enolase, neuronal, rat - 


>68 


a tcccagcca tcATG t 






Sakimura '85 PNAS B2, 7453 


190 


Epidermal CP (pre), hu 


>436 


c tea tcaaga t tATG-c 




1 


1 Bell *86 NAR 14 , 8427 


191 


Epidermal GP(pre), rou 


350 


gctaaataaaagATGc 






Scott '83 Sci 221, 236 


192 


Epididyaal proteins D, E 


>82 


gaaaatagaaccATGg 






Brooks '86 EJB 161, 13 


193 


Epoxide hydrolase, rat 


175 * 


cagtceggagtcATGt 






1 Palany '87 JBC 262, 5924 


194 


Erythxoid raerabr pr 4 . 1, hu >798 


aaacacagga&cATGc 




3 


1 Conboy '86 PNAS 83, 9512 


195 


Erythroid potentiating, hi 


> >72 


agagaacccaccATGg 






Carmichael ' B6 PKAS 83,2407 



8136 



198 



Nucleic Acids Research 



Ho. 

196 Erythropoietin, hu 

197 Fatty acid binding protein 

rat liver 

198 * rat intestine 

199 FA thloesterase, rat 

200 FA thloesterase, duck 

201 Ferritin, L-chain, hu 

202 Ferritin. L-chain, rat 

203 Ferritin, H-chain, hu 

204 rerritin, H-chain, ch 

205 Ferritin, bullfrog 

206 a-FetoprotcIn, hu 

207 prePibrinogen, Aa , hu 

208 preFibrinogen~Y< hu 

209 preFibrinogen-a , rat 

210 ■ -p, rat 

211 . " - Y , rat 

212 Fibroblast growth factor 

213 Fibronectin, hu 

214 FSH, B-chain, bo 

Guanine nucleotide bind In 

215 C OQ , bovine retina 

216 Ci a , rat brain 

217 C M , rat brain 

218 T al* bovin* retina 

219 Bjsubunit, bovine retina 

220 Y-subunit, " 

221 preproCalanin , po 

222 Gap junction protein, rat 

223 GAF-43, rat neuronal 
2 24 preproGastrin , hu 

225 Gastrin-releasing, hu 

226 Geliiolin, hu 

g-Globln family i 

227 human adult 

228 human embryonic (O 

229 baboon 61 

230 mouse adult 

231 rabbit adult 

232 goat embryonic (C) 

233 duck adult, major (a*) 

234 chicken embryonic 

B-Globln family : 

235 human fetal (Y> 

236 human embryonic (e) 

237 rabbit adult 

238 rabbit embryonic (S3) 

239 chicken adult 

240 chicken embryonic (p) 

241 Xenopus adult, major 

242 Xenopus larval 

243 ajuGlobulin, rat salivary 

244 preproGlucagon, hu pancr. 

245 preproGlucagon, rat 
246 



w w/t s s/t 



>ieo 


ccaggcgcggagATGg 1 


Jacobs '85 Nature 313, 806 


46 


ctcattgccaccATGa 


Sweetser * 86 JBC 261, 5553 




acagctgacatcATGg. 


Alpers '84 PNAS 81, 313 


>310 


tcaactcacagaATGg 


Saf ford '87 Biochra 26 1358 


>160 


aattattcaggaATGg 


Poulose '85 JBC 260,15953 - 


181 


tgccaacc&accATGa 


Santoro 1 66 NAR 14, 2863 


200 


tccgga tcagccATGa 


Leibold ' 87 JBC 262,7335 


216 


t tag tcgccgccATGa 


Hentze * 86 PNAS 83, 7226 


151 


cccgccagcgccATCg 


Stevens '87 MCB 7, 1751 


145 


caaaccgctgaaATGg 


Oidsbury '86 JBC 261 , 949 


44 


taac tagcaaccATGa 


Gibbs '87 Biochem 26, 1332 


>57 


cccttagaaaagATGt 


Kant '63 PNAS 80, 3953 


SI 


cactcagacatcATGa 


Rixon '85 Biochera 24, 2077 


58 


caatcagaaactATGc 1 


Crabtree '85 JHB 165, 1 




aaatctgaa&ccATGa 


Fowl ties '84 PNAS 81, 2313 


44 


cacccagacactATGa 


Morgan '87 NAB 15, 2774 


>320 


cccgcagggaccATGg 


Abraham '86 EHBO 5, 2523 


267 


caccgtc tcaacATGc 


Dean 'B7 PNAS 64, 1876 


>48 


caagtgcccaggATGa 


Each '86 PNAS 63, 6618 


a Drotelns i 




>196 


gaaggggccaccATCg 


VanHeurs'87 PNAS 84, 3107 


>41 


g cgga cggcagg ATGg 


Itoh '66 PNAS 63, 3776 


>246 


cccgccgccgccATGg 




>93 


cctgccaggaccATGg 


Hedynski '85 PNAS 62,4311 


>94 


taagattagaagATGa 


Fong '86 PNAS 83, 2162 


>67 


gcttagcagaagATGc 


Yat6unami'85PNAS 62,1936 


>225 


ccgtcgctcaagATGc 1 


Rokaeus '66 PNAS 83, 6287 


>31 


gaatgaggcaggATGa 1 


Paul ' 86 J Cell Biol 103,123 


>60 


gaagataccaccATGc 


Karns '87 Sci 236, 597 


6S * 


tctgcagacgagATGc 


ItO '84 PNAS 61, 4662 


>55 


cccgtcgggaccATGc 


Spindel '84 PNAS 61, 5699 




cgtg tcgccaccATGg 


Yin '86 Nature 323, 455 



55 
193 
32 
36 
46 
36 
55 



105 
101 



preproGlucagon, anglerf ish >58 



agagaacccaccATCg 
caccctgccgc cATG t 
tccagcgcgggaATGg 
caggaagaaaccATGg 
gaaggaaccaccATGg 
tcagctgccaccATGt 
ggagc tgcaaccATGg 
etc tcctgcacaATGg 

agtccagacgccATGg 
aggectggea tcATGg 
aaaacagacagaATGg 
agaccagaca tcATGg 
ccaaccgccgccATGg 
cccgctgccaccATGg 
tcaactttggccATGg 
tctacagccaccATGg 

ttccctaccaacATGa 

aagacagcagaaATGa 
cagaataaaaaaATCa 
acggttgtaaacATGa 



Baralle '77 Cell 12,1085 
Proudfoot '82 Cell 31, S53 
Shaw '87 Nature 326, 717 
Baralle '78 Nature 274 ,84 
Baralle '77 Nature 267,279 
Wernke '86 JHB 192, 457 
Erbil '62 Gene 20, 211 
Engel '83 PNAS 80, 1392 

Slightom '60 Cell 21,627 
Baralle '80 Cell 21, 621 
Baralle '77 cell 10, 549 
Kardison '61 JBC 256,11780 
Dolan '83 JBC 256, 3983 
Koninson '81 PNAS 78, 4762 
Patient '83 JBC 2SB,8521 
Banville ' 83 JBC 258 , 7924 

Laperche '83 Cell 32, 453 

White '86HAR14. 4719 
Heinrich '84 JBC 259,14082 
Lund '82 PNAS 79, 34S 
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V 


w/t 


s s/t 


24 7 


Glucose-6-P-dehydrog . . hu 70 




atattcatcatcATGg 




1 Martini '86EMBO 5,1849 


248 


Glucose-regl'd 7SKprot,rat 206 




agcgccggcaagATGa 




Chang '87 PNAS 84, 680 


249 


Glucose transporter, hu 


>180 




cgcagcgctgccATGg 




Hueckler '85 Sci 229,941 


250 


Glucose transporter, rat 


>207 




cgcagcgcggccATGg 


7 


Birnbauin '86 PHAS 83, 5784 


2S1 


6-Glucuronidase, hu 


>26 




ggaccgggaagcATGg 




Oshima '87 PHAS 84, 685 


252 


Y-Glutamyl transpeptidase >22B 




C 99»cgag 




Laperche '86 PHAS 83, 937 


253 


Glutathione peroxidase , ir 


iu 37 




aacatctccagtATGt 




Chambers '86 EMBO 5, 1221 




Glutathlone-S- transferase s i 










254 


-subunit 2, hu 


>55 




gagactgctatcATGg 




Board '87 PHAS 84,2377 


255 


-Ya subunit, rat 


64 


* 


acagttgctgctATCt 




1 Pickett '86 PHAS 83,9393 


256 


. -Ybi subunit, rat 


>37 




agcgccagaaccATGc 




Ding *85 JSC 260, 13268 


257 


-Yc subunit, rat 


>42 




gcaattgctgccATGc 




Pickett '85 JBC 260, 5B20 


258 


GST-pl accntal , rat 


70 




tacgcagcagctATGc 




Okuda '87 JBC 262, 3858 


259 


GAP0H, hu 


>75 




cgctcagacaccATGg 




TSO '85NAR 13,2485 


260 


GAPDH , rat 


71 




ctcatagacaagATGg 




Fort '85 HAP. 13,1431 


261 


CAPDH, rat 


57 




t a t a a ag gcgag ATGg 






262 


Clycero-P-dehydrogenase , 


mu 21 




g caag cagcaccATGg 




Ireland * 86 JBC 261,11779 


263 


Glycogen phosphorylase, hu>113 




g ccgccccagccATGg 




Newgard '86 PHAS 83, 8132 


264 


Gonadotropin-B, salmon 






tgag tcct tcagATGt 




Trinh '86 EJB 159,619 


265 


Growth hormone, hu 


60 




casctagc tgcaATGg 




DeHoto '81 NAR 9, 3719 


266 


Growth hormone, rat 


60 




cac tgagtggcgATGg 




page '81 HAR 9,2087 


267 


Growth hormone, salmon 


>64 




ttaagagtaaaaATGg 




Sekine '85 PHAS 82 , 4306 


263 


GH-releasing factor, hu 


91 


* 


cccgggtgaaggATGc 




Mayo '85 PHAS 82,63 


269 


Haptoglobin, hu 


30 




agaccaaccaagATGa 




Bensi '85 EMBO 4 , 119 


270 


Heat shock 7 OK, hu 


212 




gcagggaaccg cATGg 




Hunt '85 PHAS 82,6455 


271 


Heat shock 70K. hu 


119 




gaagcttcagccATGc 




Voellmy '85 PHAS 82,4949 


272 


Heat shock 27K, hu 


>40 




gagtcagccagcATGa 




Hickey *B6 HAR 14,412? 


273 


Heat shock 73K, rat 


80 




acg ceagcaaccATG t 




Sorger '87 EMBO 6,993 


274 


Heat shock 70K, ch 


111 




gaatctatcatcATGt 




Horimoto '86 JBC 261,12692 


275 


Heat shock 108X, ch 


101 




ggccgcggca tcATGa 




Kulooatf 86 Biochea 25,6244 


276 


Heat shock 70K, trout 


>60 




ttattcggtaacATGt 




Kothary *B4 MCB 4,1785 


277 


Heat shock 70K, Xp 


124 




aggagcgcaaatATGg 




Bienx '84 EMBO 3,2477 


278 


Helix-destabilizing, rat 


>28 




catcctaccgtcATGt 




Cobianchi ' 86 JBC 261 ,3536 


279 


Heme oxygenase, rat 


128 




cccag tecegeg ATGg 




Muller *87 JBC 262,6795 


280 


8-Hexosarainidase , a-chain >168 




gaccagcgggccATGa 




Kyerowitz '85 PHAS 82,7830 


281 


KMG-14, hu 


>145 




ccccgccgccagATGc 




Land sua n 1 86 JBC 261 , 16082 


282 


HMG-17, hu 


>88 




gccgccgccaccATGc 




Landsnan '86 JBC 261 ,7479" 


2B3 


His-richglycoprot. , hu 


>120 




tggtttaacaaaATGa 


3 


2 Koide ' 86 Biochem 25 , 2220 



Histocompatibility antigens (MHC) : 



284 


Class 


X<hu)i 


HLA-BwSa 




tcag acgc cgagATGc 


Hays 1 85 JBC 260,11924 


285 


Class 


II thu) t 


DR(a-chaln) 


64 


cccaagaagaaaATGg 


Schamboeck '83 HAR 11,8663 


286 


Class 


II (hu) t 


OR (0 -chain) 


>30 


ctg t tctc cagcATGg 


Tieber ' 86 JBC 261,2733 


287 


Class 


IX (hut i 


DC-36 


58 


ttcgtctcaattATGt 


Boss '84 PHAS 81, 5199 


288 


Class 


xi (hu) i 


SB (DP) -a 




agaccccacaacATGc 


Lawrance *85 HAR 13,7515 


289 


class 


I(mu) i 


H-2K b 




teagtcgtcagcATGg 


Kinura '86 Cell 44,261 


290 


Class 


I (nu> ■ 


H-2K d 


>27 


ccga ccag tgcgATGg 


Lalanne '83 HAR 11,1567 


291 


Class 


I (mu) i Tta. gene T3 D 




gatttccctaacATGa 


Pontaroti'86 PHAS 83,1782 


292 


Class 


II (mu) 


A(S-chaln) 


>36 


tgtgccttagagATGg 


HcDevitt '87 PHAS 84 ,2435 


293 


Class 


Il(tmi) 


A(B2-chain> 




gtcccctccag aATGg 


Peterson '85 JBC 260,14111 


294 


Class 


II (mu) 


E (a -chain) 


48 


cccaagaagaaaATGg 


Mathis *83Cell 32,745 


295 


Class 


XI (mu) 


E(B-chain) 


52 


ctetcctgcagcATGg 


Saito 'B3 PHAS 80,5520 


296 


Class 


11 (mu) 


C(B2-chain) 




ctctcotcaagcATGg 


Braunstein '86 EMBO 5,2469 


2979 


Class 1 1 -as so 


c'd Ia(In), hu 


58 


cagaagccagtcATGg 


Strubln '86 Cel 147, 619 
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Hi stone 


HI, hu 




tttcttgccaccATGt 


299 




H2a, hu 


46 


tcagaagtagttATGt 


300 




H2b, hu 


40 


ctagacagtgctATGc 






H3, hu 


37 


tg tggt tt tgccATGg 


302 




H3, hu 


27 


gcagttctgcga ATGg 


303 




H3.3, hu 


111 • 


ggtetctgtaccATGg 


304 




H4, hu 


36 


ttgctcgtcgtcATGt 


30S 




H2b, nu 


41 


tctctgttcaetATGc 


306 




H3.1, n>u 


26 


cggttacttgccATGg 


307 




H3.2, nu 


21 


t tctttg tagaaATGg 


308 




HI, ch embry. 


38 


acgtccgtcaccATGt 


309 




H2A.1, ch 


146 


tcagtcgctgcgATGt 


310 




H2A.P, ch embry 70 


ggcggcggcaccATGg 


311 




H2B, ch 


36 


g gagag t teg ac ATGc 


312 




H3.3A, ch 


• 


g t egg c agcag c ATGg 


313 




H3.3B, ch 


>105 * 


aagtgagaaaa&ATGg 


314 




H4, ch embry. 


26 


caggctctcggcATGt 


315 




H5, ch ery throcyta 109 


gaagcggcggccATGa 


316 




Hi, Xp 


28 


tttacttcaaagATGa 


317 




H2A, Xp 


47 


agoacagtaatcATGt 


318 




H2B, Xp 


35 


agcagcacaattATCc 


319 




B3, Xp 




aactgatacac tATGg 


320 




H4, Xp 


28 


gctcaagaaagaATCt 


321 


Hydroxy indole O-MeTr 1 


ase,bo>120 


eceagaaggaagATGt 


322*1 


HMG-CoA reductase , hu 


73 to 105* 


tctgtagctacaATGt 


323 


HMG-CoA synthase , hu 


63 or 122 


tgctctttcaccATGc 


324 


HPBT, hu 


100 to 170 


geegge tcegt tATGg 


325 


HPRT, n 


u 


90, 118 


acegg tcccg tcATGc 


326 


• Ig L-chain, kappa- I I 


, hu 


caccttctcacaATGa 


327 


Ig L-chaln, kappa-III, hu 


cccagaggaaccATGg 


328 


Ig L-chain, kappa-IV 


, hu 25 


aggggcagcaagATGg 


329 


Ig H-chain, IgB, hu 


>56 


cegttcctcaccATGg 


3309 


Ig L-chaln, kappa, m 


u 18 


catcacaccagcATGg 


331 


Ig L-chain, lambda^, 


mu 40 


ggtttgtgaattATGg 


332 


Xg H-chain, nu 


>4S 


agtcctgtcactATGa 


333 


Ig L-chain , ch 




tgggattccgccATGg 


334 


Xnhibin, A-subunit, 


hu >144 


g ccagg tgagetATCg 


335 


Inhibin, A-eubunit, 


bo >60 


gccaggggagctATGt 



St w/t S fl/t 



Inhibitors t see also EPA, 

336 anti-Protein C, hu 

337 al-antiChyroo trypsin, hu 
336 anti-Elastaso , hu 

339 an ti -placental PA, hu 

340 antiThrombin III, hu 

341 (ai-antiTrypsin, hu liver 
1 macrophage 

342 Insulin, hu 

343 insulin-I, rat 

344 Insulin, gp 
345* Insulin, ch 

346 Insulin, anglerfish 

347 Insulin, salmon 

348 insulin- like GF I, hu 

349-/ insulin-like CP II, rat 

350 Integrin, ch 

351* Interfcron-o2, hu(LelFA) 

352 Interf aron-o, nu 



Kininogen, Lipocortin, Hacro- and 
agaacatccaccATGc 
gc agag ttgagaATGg 
cctgccttcaccATGa 
cagattgaaacaATGg 
agat tagcggccATGt 



>46 




17 




>5S 




<70 




49 


* 


520 


* 


59 


* 


57 


* 


60 


* 






>85 




>71 




>1B0 




100 


* 


96 
V70 




•V70 





tgtccttctgccATCg 
cattgttccaacATGg 
ca tec tttcatcATGg 
ccccagc tea tcATGg 
ttctactgcagcATGg 
cctaccatcaccATGg 

acttcagaagcaATGg 
cttcagg taccaATCg 
cggcgcgcggccATCg 

gcaacatct acaATGg 
gcaacactcaccATGg 



(1) 



Carozzi '84 5ci224,1115 
Zhong '83 MAR 11, 7409 



Clark '81 HAS 9,1583 
Hells '87 HAR 15,2871 
Heintz '81 Cell 24,661 

Sittman '83 HAR 11, 6679 



Sugarman '83 JBC 258 , 9005 
D ' Andrea '81 HAR 9,3119 
Harvey '83 PNAS 80, 2619 
Grandy '82 JBC 257, 8577 
Brush '85KCB 5,1307 
Dodgson ' 87 NAB 15,6294 
Sugarman '83 JBC 258, 9005 
Krieg '83 HAR 11, 619 

Turner *83 HAR 11, 4093 
Moorman '82 FEBS 144 ,235 

Moorman '81 FEBS 136, 45 



Ishida '87 JBC 262 , 2895 
Luskey ' 87 MCB 7 , 1881 1 and 

■ ' 85 JBC 260 , 10271 
Gil *B7 PHAS 84, 1B63 

patel '86 MCB 6, 393 
Helton '84 PHAS 81, 2147 

Klobeck 'B5KAR13, 6499 

Harsh '85 HAR 13, 6531 
Kenten '82 PHAS 79, 6661 
Kelley '82 Cell 29, 681 
Picard '83 PHAS 80, 417 
Early '80 Cell 19,981 
Reynaud * 87 Cell 48,379 

1 Mayo '86 PHAS 83, 5849 
Forage '86 PHAS 83,3091 

Microglobulins 

Suzuki '87 JBC 262, 611 
Chandra '83 Biochem22 ,5 OS 5 
Stetler '86 HAR 14, 7883 
ye '87 JBC 262, 3718 
Bock '62 HAR 10, 8113 
Ciliberto '65 Cell 41,531 
(1) Perlino '87EMB06, 2767 
Bell '80 nature 284, 26 
Loaedico '79 Cell 18,545 
Chan '84 PHAS 81, 5046 
perler * 80 Cell 20, 555 
Hobart'BOScl 210,1360 
sorokin '82 Gene 20, 367 

1 Rotwein ' 86 PHAS 83 , 77 

Soares ' 86 JMB 192, 737 

Tankun *86 Cell 46, 271 

Lawn '81 PHAS 78, 5435 
. Shaw '83 HAR 11, 555 
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353 mterferon-e, rat "v72 

354 Interferon-a, bo V70 

355 interferon-^, hu fibroblast 75 

356 Interf«ron-$ 2 » hu 63 

357 Interfcron-y, hu immune 

358 Interferon-y, rat 

359 I FN- induced gene 6-16, hu 

360 IFN- induced ISG-54K, hu 

361 ifn- induced ISKprot. hu 



w w/t a s/t 



362 Involucrin, hu 



364 
365 
366 
367 
368 
369 

370 
371 
372 
373 

374 

375 

376 

377 

378 

379 

380 

381 

382 

383 

3B4 

385 

386 

387 
388 
389 
390 
391 
392 
393 
394 
395 



130 
110 

106 
75 
>7S 

62 



Keratin-I, 50K, hu 60 
Keratin, 67K, hu 47 
Karatin-II, 56K, hu 
Keratin-I, 47K, mu 
Keratin-I, 59K, mu 
Keratin B2A, ah 
(see also #170-174) 
protein Kinase C, a/ 6, r a >221 
protein Kinase C, y, ra 
protein Kinase, cAMP-alt 
■ II <Ca 2+ ), rat 



>116 
25 



>204 
>S0 
*v200 



preKininogen, hu 130,154,184 
a-Lactalbumin, hu -v26 
LDH-A, hu >99 * 

LDH-C, mu >54 
Lainin C, hu >200 
L-C acyltransferase, hu 24 
Lens HIP, bo >50 
Leukocyte adhesion pr. 8, hu >72 



Lipase, rat hepatic 
Lipase, rat lingual 
Liprotein lipase, hu 
Lipid binding protein , mu 
Lipocortin-II, hu 
apoLipoprotein A-I, rat 
apoLipoprotein A-I* ch 
apoLipoprotein A-II, hu 
apoLipoprotein A-II, mu 
apoLipoprotein II, VLD, ch 
apoLipoprotein A- IV, mu 
apoLipoprotein B, hu 
apoLipoprotein C-IX, hu 
apoLipoprotein "E, rat 



>15 

48 
>174 

65 
>50 

40 * 

58 * 
>41 

77 * 

91 
126 

38 * 

65 • 



gccaeatttgccATGg, 
tea agg tccccg ATGg 
eg tg t tg tcaac ATGa 
aggagece age tATGa 
ctctcggaaacgATGa 
agctctgagacaATGa 

g eg eg eg c c ac cATGc 
tgcagtgcaac cATGo 
cagcccacagccATGg 

gtagcttctaagATGt 

ctcctctgcaccATGa 
ctctaagtcaacATGa 
atctctggaaccATGg 
cctctctcagccATGa 
cactacaccaccATGt 
actcctgacaccATGg 

cccgcgcgcaagATGg 
ttgggggggaccATGg 
atcgccccagtcATGg 
atcgccaccgccATGg 

gattgttagateATGa 

ggggtagpcaaaATGa 

tccaagtccaatATGg 

gtaaggctc&acATGt 

aacc tgccggccATGg 

accagggctggaATGg 

atcccccctgccATGt 

acaccgagggacATGc 

aagacgagagacATGg 

tagcagtacaagATGt 

acgcgccccgagATGg 

aagg tt tacaaaATCt 

gcttcct tcaaaATGt 

aca tec ttcagg ATCa 
ttcagegcgaagATGa 
ectgttaceaacATCa 
tag tctgccatc ATGa 
taccaa caaaccATCg 
tgaggagccaggATGt 
ccgcagctggcgATGg 
tctctggacactATGg 
acaactgggaagATCa 



396 


Luteinizing hormone (0) 


rat 7 


atcaagaATGg 




Lvmphokines i 






397 


CSF-1 , hu macrophage 


178 


ccagctgcccg tATGa 


398 


CH-CSF, mu 


35 


gtcctgaggaggATGt 


399 


inulti-CSF, hu (IL-3) 


>38 


geega tecaaa cATGa 


400 


* mu 


29 


cagaacgag acaATGg 


401 


deleted 






402 


Interieukin-lo, hu 


>45 


aaagaagtcaagATGg 


403 


InterleuXin-lfi, hu 


87 * 


tctgaagcagccATGg 


404 


BSF-1, hu (IL-4) 


>63 


cgacacc ta t taATGg 


405 


BSF-1 , mu (Z0K) 


63 


acagagc ta ttgATGg 


406 


BSF-2, hu 




aggageccagctATCa 



Oijkema '84 NAR 12, 1227 
Capon '85KCB 5,768 
Ohno '81 PNAS 78,5305 
Haegeraan'86 EJB 159,625 
Gray '62 Nature 298, 859 
Dijkema '85 EMBO 4, 761 

Kelly '86 EMBO 5, 1601 
Levy '86 PNAS 83, 8929 
Blomstrom 1 66 JBC 261 ,8811 

Eclcert '86 Cell 46, 563 

Harchuk '85 PNAS 82,1609 
Johnson '85 PNAS 82, 1896 
Tyner ' 85 PNAS 82 , 4683 
Knapp '86 NAR 14, 751 
Krieg '85 JBC 260, 5867 
Powell '63 NAR 11, 5327 

Ohno '87 Nature 325, 161 

Showers '86 JBC 261,16288 
Bennett '87 PNAS 84, 1794 

Kitaraura 1 85 JBC 260, 6610 

Hall '87 Bloch. J. 242,735 

Tsujibo '85 EJB 147, 9 

Sakal '87 Bioch.J.242,619 

Fisher '86 PNAS 83 , 6450 

McLean ' 86 NAR 14, 9397 

Gorin '84 Cell 39, 49 

Kishimoto *87 Cell 48,681 

Koraaromy*87 PNAS 84,1526 

Docherty ' 85 NAR 13 , 1891 

Wion '87Sci 235, 1638 

Phill ips * B6 JBC 261 , 10821 

Huang '86 Cell 46,191 

Haddad 1 86 JBC 261 ,13268 
Lusis '87 JBC 262, 7058 
Shelley '85 J KB 186, 43 
Kunisada ' 86 NAR 14 ,5729 
AB '83 NAR 11. 2529 
Williams '86 HCB 6, 3B07 
Protter '86 PNAS 83, 1467 
Wei '85 JBC 260, 15211 
Fung * 86 JBC 261 , 13777 

Jameson *84 JBC 259.154 74 

Ladner '8? EMBO 6, 2693 
Hiyatake '85 EHB04.2561 
Dorssers '87 Gene 55,115 
Hiyatake '85 PNAS 82, 316 

March '85 Nature 315,641 
Clark '86 NAR 14, 7897 
Yokota *86PNAS 83,5894 
Otsuka '87 NAR 15, 333 
Hirano '86 Nature 324, 73 
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w w/t 


s «/t 




407 


Lymp ho toxin (TNP-B) hu 


>79 


ttggt tctcccc ATCa 






Gray '84 Nature 312,721 


408 


Lysophospholipase, rat 


21 


cagacac tcactATGg 






Han '87 Biochen 26, 1617 


409 


Lysozyino, ch 


29 


gacactggcaacATGa 






Jung '80 PNAS 77, 5759 


410 


a 2 -Macroglobulin. hu 


>43 


tctttctgcaaeATGg 






Kan '85 PNAS 82, 2282 


411 


(^-Microglobulin , rat 


>63 


• cct ttccgcagcATGg 






Cehring 'B7 JBC 262, 446 


412 


Halate dehydrogenase , mu 


>50 


cccgeccfcagccATGc 






Joh *B7 Biochem 26, 2515 


413 


Malic enzyme, mu 


>65 


cegg tgccagccATGg 






Bagchi '87 JBC 262, 1558 


414 


Halic enzyme, rat 




aoaa tactq q c catgq 






Hagnuson ' 86 JBC 261 , 1183 


415 


Mn superoxide disrautase mu 


>55 


t a a ac c tcaa t aATG t 








416 


Melanoma Kg p97, hu 


>60 


cccgacggcgccATGc 






Rose '86 PNAS 83, 1261 


417 


Menadione reductase , rat 


>74 


act tc tggagccATGg 






Robertson '86 JBC 2G1 ,15794 


4 IB 


Hetallothlonein-l A , hu 


73 


ccgcggctcgaaATGg 






Richards '84 Cell 37, 263 


419 


-I B , hu 


69 


cttggctccacaATGg 






Heguy '86 KCB 6, 2149 


420 


-l F , hu 


71 


cctcggcttgcaATGg 






Varshney '86 MCB 6, 26 


421 


- -11, hu 


69 


cttcagctcgccATGg 






Karin *82 Mature 299, 797 


422 


-la, sh 


72 


cttttoctccaaATGg 






Peterson '86 EJB 160, 579 


423 


^Microglobulin, hu 


>72 


gagcccategccATGa 






Traboni '86 HAS 14, 6340 


424 


^Microglobulin, mu 


>52 


tcagtcgtcagcATGg 






Daniel '83 EMBO 2, 1061 


425 


Hullerian inhibiting subs t, 10 


agcacccacgATCc 






Cate '86 Cell 45, 685 


426 


Multidrug resistance, hu 


140 


cgcgaggtcgggATGg 






Ueda '87 JBC 262, 505 


427 


Mx protein, mu 


>213 


gagagccagacgATGg 




1 


Staeheli '86 Cell 44, 147 


426 


Myelin basic protein, mu 


47 


ggcttggatgtgATGg 


1 




Taxahashi '65 Cell 42, 139 


429 


Myelin P2 protein, mu 


>44 


aaggtttaeaaaATGt 






Bernlohr '84 PNAS 81,5468 


430 


Myelin P 0 (peripheral), rat >31 


ccteccccagctATGg 






Lemke '85 Cell 40, 501 


431 


Hyelin-assoc.'KAG*, rat 


>130 


ttgc tggacaagATCa 






Arquint '87 PNAS 84 , 600 


432 


Myeloperoxidase , hu 


>163 


aggagaagagagATGg 


1 




Morishita *87 JBC 262,3844 


433 


Myoglobin, hu 


70 


tcagactgcgccATGg 






Waller '86 MCB 6, 4539 


434 


Hyoglobin, mu 


55 


ttagaagccaccATGg 






Blanchotot'B6 EJB 159,469 



435 


H-chain, rat embry. skel. 


90 


• * 


tcagccaacactATGa 


Strehler '86 JMB 190, 291 


436 


H-chain(fast), ch. adult 


60 




gtgagcgcagccATGg 


Gulick '85 JBC 260, 14513 


437 


H-chain(fast), ch. embry, 


101 




taaacagcgacgATGg 




438 


L-chain 1, imi 


125 




cttttaatcaaaATGg 


Robert '84 Cell 39, 129 


439 


L-chain 3, mu 


94 




tagaactccatcATGt 




440 


L-chain '2, rat skel. 


56 




aggatctaagacATGg 


Nudel '84NAR12, 7175 


441 


L-chain 1, ch axel. 


123 




aagoaacacaaaATGg 


Nabeshima'84 Nat 308,333 


442 


L-chain 3, ch axel. 


71 




caactctcaatcATCt 


•82 HAKlO, 6099 . 


443 


L-chain 2A, ch cardiac 






ctctgcgaagacATCg 


Winter '85 JBC 260, 4478 


444 


Neurofilament p6B, mu 






ccggccgecaeeATCa 


Lewis '66 KCB 6, 1529 


445 


Neuroleukin, mu 


>52 




gggtccctcgecATOg 


Gurney *86 Sci 234 , 566 


446 


Neural cell adhesion, ch 


>215 




ccgccggctgcgATCc 


Edelman '87 Sci 236, 799 


447 


Neural cell adhesion, au 


161 




cggcagtttacaATGc 


Bar thai s '87 EMBO 6, 907 


448 


Neuropeptide Y, hu 


86 




gcgccagccaccATGc 


Minth '86 JBC 261, 11974 


449 


Nerve GF (a) mu 


42 




acacctgttaccATGt 


Evans '85 EMBO 4, 133 


450* 


" " <6> submax. gland 99 


• 


ctcctagtgaacATGc 


Selby '87 MCB 7, 3057 


451 


■ » <y> mu 


42 




acacctgtcaccATCt 


Evans '85 EMBO 4, 133 


4 52 


Nuclear prot. N1/N2, xp 


>64 




gggttgctgatcATGg 


Pranxe *66EMB0 5, 3547 


453 


Hucleoplasniin , Xp 


>113 




tatctacgtgacATGg 2 


Dingwall'87EKB0 6, 69 
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454 

455 

456 
457 
458 
459 
46a 
461 
462 
463 
464 
465 

466 

467 

46S 

469 

470 

471 

472 

473 

474 

475 

476 

477 

478 

479 

460 

481 

482 

483 

484 

4B5 

486 

487 



proto-Oncoge ne a i 
,c-abl, mu, type I mRNA >93 
i type IV mRNA >200 

ic-bcl-2, hu, S.SkbmRNA >1000 
I 3.5 kb mRNA >150 



e- bcr , hu 
c- crb -A, hu 
c- erb -A, ch 
c-fea/ Tps , fe 
c- fras , hu 
c-hst, hu 
c-int-1, mu 
c-int-2, nu 
pp56-LSTOA, mu 
c- lyn , hu 

iq-roos , mu, ovarian mRNA 
I M testicular mRNA 
c- myb , hu 
c-myb, nu 
c- nyc , hu 

c-myc, fe 

c-neu, hu (HER2) 

c- pira -1 , mu 

c- raf- 1, hu 

A-raf-1, hu 

c-ral , simian 

c-Hn-ras-1, hu 

c-Ka- ras -1 , rat 

c-Ki-raa , mu 

N-ras, hu 

R-raa, hu 

rho (ras-related) 

c-ais, hu (PDGF2) 

c- src , ch 

c-arc , Xp 

c-syn, hu ("c-slk") 
c- yea , hu * 
pS3, hu 



>SJ4 
>300 



>300 

>236 
184 • 

>326 

>193 
297 
-v70 
280 

>113 



ggecacgggaccATGt 
tattattgctttATGg 
cctctgggaaggATCg 

gccggccgcgceATGg 
acccccaacagtATGa 
gaattgcgg tgaATCg 
gcggacggcactATGg 
cccaccgaggccATGg 
ectcgggccgggATGt 
gceag'gcaggccATGg 
cgcgatgccgggATGg 
ccgggagggateATGg 
cgagcgggaaa tATGg 

tetgagggtgtaATGc 



MOO 
>129 
>194 



gceegccgcgccATGg 
200 to 680 gcccgcctcgccATGg t 
400, 570* cctcccgcgacgATGc 
400, 587* geaggcgccgcgATGc 
17B + gcagtgagcaccATGg 
ctggagg tgggg ATGc 
taagctgcatcaATGg 1 
atctaaggctccATGg 1 
c tg tgacacg ag ATGg 
69 to 332 * ccctgaggagcgATGa (1) 12) 

M.75 * cctgtagaagcgATGa 1 
200 to 250 * ggcctgctgaaaATGa 

*»,245 tgctggtgtgaaATCa 2 

65 agcggtggcgacATGa 
>159 gttgcctgagcaATGg 
1022 cccggagtcggcATGo 2 
>100 * cageccaccaccATGq 
>58 caacaggacaagATGg 1 
>589 ggaatttagataATGg 1 
>162 gcagatttgataATCg 
138, 230* cgggtcactgccATGg (2> 



488 


preproOpionelanocortin, 


bo 129 * 


cctgcctggaagATGc 


489 


preproOpiomelanocortin, 


Xp 62 * 


tccagtcctgaaATGt 


490 


Ornithine ATase, hu 


>54 


ttgaaggacacaATCt 


491 


Ornithine ATase, rat 


>50 


aggacccacacaATGc 


492 


Ornithine decarboxylase 


>300 


acatcgagaaccATGa 


493 


OTCase, mu 


136 


ageaaaaagaagATGc 


494 


Ovalbumin, ch 


64 


"tcagag t tcaccATGg 


495 


Ovo Inhibitor, ch 




aggtgctctgccATGa 


496 


ovomucoid , ch 


53 


cag t acctcaccATCg 


497 


3-0xoacyl-CoA thlolase, 


rat>100 


c tg agct tcgtcATGg 


498 


prcprooxy tocin , bo 


33 


cgcg tc tgcaccATGg 


499 


preproOxytocin , rat 


40 


aacaccaacgccATGg 


500 


Pancreatic polypeptide 


hu>50 * 


tc tggec tccggATGg 


501 


parathyroid hormone, hu 


>70 * 


ttg ta tgtgaag ATCa 


502 


Parotid secretory protein mu 55 • 


agcaaaccaaagATGt 


503 


Pepsinogen, hu 


54 


ccgg g a aga a cc ATGa 


504 


Pepsinogen, rat 


>60 


caaa ccggca teATGa 


505 


Peroxi. enoyl-CoA hydratase 24 


taccttgagaaaATGg 



Ben-Neriah'66 Cell 44,577 

2 Croce '86PNAS 83, 5214 
1 

1 Adams 'B7 EMBO 6, 115 

2 Evans 1 86 Nature 324 , 64 1 
1 sap *86 Nature 324, 635 

Roebroek '87 JV61. 2009 
Coussens«6 Mature 320, 277 
Taira "87 PNAS 84, 2980 
1 Varmus 'B5MCB 5, 3337 
Moore * 86 EMBO 5, 919 
Sefton '86 Nature 319,682 
Yamanashi '87HCB7, 237 

Propst '87 HCB 7, 1629 
Hajello '86 PNAS 83, 9636 
Watson '87 EMBO 6, 1643 

± Saito '83 PNAS 60, 7476 
Stewart *B6 Virol 154,121 

1 Tal '67 HCB 7, 2597 

Selton 'B6 Cell 46, 603 
Bonner ' 86 NAR 14 , 1009 

1 Beck "87 HAR 15, 595 

Chardin '86 EMBO 5, 2203 
Bonkawa '87 HCB 7, 2933 
Damante '87 PKAS 84, 774 
Hoffman 'B7 HCB 7,2592 
Kail '85 HAR 13, 5255 
Lowe '87 Cell 4B, 137 
Yeramian -87 HAR IS, 1869 

1 Rao '86 PHAS 83, 2392 
Takeya '83 Cell 32, 881 
Steele *85NARl3, 1747 

2 Semba '86 PNAS 83, 5459 
Sukegawa '67 HCB 7, 41 

<l> Lamb '86 HCB 6, 1379 

Nakanishi "81 EJB 115,429 
Martens '87 EJB16S,467 

Xnana '86 PKAS 83, 1203 
Hueckler '85 JBC 260,12993 

Gupta '85 JBC 260, 2941 
Veres '86 JBC 261 , 7588 
McReynolda '78 Nat 273,723 
Scott '87 JBC 262, 5899 
Catterall ' 80 JCel IB 87,480 
7 Arakawa '87 EMBO 6, 1361 
Ruppert '84 Nature 30B,5S4 
Ivell ' 84 PNAS 81 , 20O6 
Lelter '85 JBC 260, 13013 
Vasicek ' 83 PNAS 80, 2127 
Poulsen '86 EMBO 5, 1891 

Sogava 'S3 JBC 258, 5306 
Ichihera '86 EJB 161, 7 

I Shi i '67 JBC 262, 8144 
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506 

507 

503 

509 
510 

511 
512 
513 

514 

515 

516 
517 

51B 

519 

520 

521 

52 2 m 

523 

524 

525 
526 

527 
528 
529 
530 
531 
532 

533 
534 
535 

536 
537 

538 
539 
540 
541 
542 
543 
544 
545 
546 



Phenylalanine hydroxylase >222 
phosphate carrier prot. bo >62 
Phosphodiesterase, cyclic >59 
PEP carboxykinaae , rat 143 
PEP carboxykinaae , ch 



94 
>20 



PGK-1 (X-linked) , hu 
PGK-2, om testicular 
PGX, mu X-linked 
y-Phoaphor-kinase, nu 
Phosphorylase, purine, hu >109 

Pituitary hormone a, a, hu 100 
- t a,nu 100 

plasma cell glycoprotein >1H 

u-Plasninogen activator hu 119 

PDGF, A-chain, hu >387 

platolet factor 4, rat 73 

PolymeraBe-BtDRA), rat 51 

Polymerase II (RHA), mu 406 

Poly(A) binding protein, hu>502 

Porphobilinogen deaminase >83 

Prealbumin, hu 26 



eggggagccagcATGt 
cttagggagaagATGt 
ttcttccgcaaaATGt 

accattgcaagaATGc 

166, 246 • gcagctgeagtaATGg 
tgtatttceaaaATGt 
cateccatcaagATGg 
ggtcttgccaaaATGt 

atccaegtgaccATGa 

gtctgcgagaccATGg 

gaaaggagcgccATOg 
tgcagaagagctATGg 

cagagcggggcgATGg 

gacctcgccaccATGa 

cctcgggacgcgATGa 

caccctcttgacATGa 

qtccccgqcaccATGa 

gcctgcctcgccATGc 

agccgtgcogagATGa 

aacagcccaaagAlGa 
attcttggcaggATCg 



prolactin growth hormone family > 

prePlacental lactogen, hu 62 

prePlacental lactogen, nu >59 
Proliferin, mu 68 
Prolactin, hu 57 
Prolactin, rat >51 
Prolactin, bo *67 

proline-rich (acidic) .ha 33 

" " (glycosylated) 
Prostatio BP, C2 chain, rat 41 

Protamine 1, no 92 
Protamine, trout 1* 
Pr oteases i see also 109.109.114,126,363.503-4,519 
-batroxofain, snake 179 egagttgaagctATCg 

-ser protease, mu adipose 19 cctgctgtcagaATGc 
-Ca 2+ prot"ae, hu 105-155 * tgagtcgcagccATCt 

-Ca2+ proteaBe , ra >150 tgagcegeagccATGt 

-Ca2+protease. ch 37 cagtacgcagctATGa 

-cys protease, mu >59 ggtgtttgaaccATCa 

-ser protease, EGr binding acaectgttaecATCt 
-mast cell protease, rat 35 accactggcacaATGc 
-ser p'ese, cytotoxic? cells >U1 cttccggggaegATGa 



cacctagtggcaATGg 
aactcctcagagATCa 
gactctgcagagATGc 
acgatcacgaacATGa 
gtgg tcatcaccATCa 
atcatcaccaocATGg 

gcctcctccaagATGc 
gcctccagcgagATGc 
aaactgagcaccATGa 

caagceagcaccATGg 
coatcaatcaaaATGc 



Kwok 'eSBiochem24, 556 
BunswicX *87 EHBO 6, 1367 
lWrihara '87 JBC 262,3256 
Beale ' 85 JBC 260, 10748 
(1) (1) Cook '66 PHAS 83; 7583 
Rlggs '64 Gene 32, 409 
Boer '87 HCB 7,3107 
Hori '86 Gene 45, 275 
Caskey '87 PHAS 84, 2886 
Williams '84 HAB 12, 5779 
Fiddes '81 JKAG 1,3 
Chin '81 PHAS 78, 5329 
vanDrlel '87 JBC 262,4882 
Riccio '85KAR13, 2759 
1 1 Be tsholtz '86 Nat 320,695 
Do! '87 HCB 7,898 
yanaguchi '87 HCB 7.2012 
Ahearn '87 JBC 262, 10695 
Grange '87 KAR 15, 4771 
Raich '86 HAR 14, 5955 
Sasaki '85 Gene 37, 191 

Saunders '83 JBC 258, 3787 
Jackson '86 PHAS 83, 8496 
Linser '87 EHBO 6, 22B1 
Truong 1 84 EHBO 3 , 429 
Cooke '80 JBC 255, 6502 
Sasavage '62 JBC 257,678 

Ann '87 JBC 262, 3958 
Haeda 1 85 JBC 260, 11123 
Delaey '87 HAR 15. 1627 
Peschon • 87 PHAS 84 , 5316 
Gregory '82 KAR 10, 7581 

Itoh ( 87 JBC 262, 3132 
Hin '86 HAR 14, 8879 
Kiyake ' 86 KAR 14 , 8B05 
2 Emori '86 JBC 261, 9472 

Ohno 'MHature 312, 566 
Portnoy * 86 JBC 261 , 14697 
Lundgren '64 JBC 259,7780 
Benfey '87 JBC 262 , 5377 
Brunet '86 Nature 322,268 



548° 

549 

550 

551 

552 

553 



>91 
162 



PDX (disulphide isomerasa), rat 
Proteoglycan 19 (chondral tin) 

" 38K core protein, hu 
Proteolipid protein, mu 
pulmonarjr_surff actajit , hu 53 
Pyruvate kinase, ch 80 * 

Ouinone reductase, rat >113 



ccgacgtccgacATGc 

gagctggtcaggATGc 
atgag a taaatcATGa 
agtgccaaagacATGg 

ggacccagagccATGt 

actccagtaaccATGt 

ttcaac ta tgccATGa 



•85 Nature 317, 267 

Bourdon 1 86 JBC 261 , 12534 
Kruslus '86 PHAS 83, 76B3 
Hudson '87 PHAS B4, 1454 
White *85 Nature 317 , 361 
fconberg '83 PHAS 80,3661 
Bay nay '87 JBC 262, 572 
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554 snRNP-B" antigen <U2) , hu >125 

555 RKP-C protein, hu >122 

556 Ul-RNA- associated 70K. hu >680 



557 
558 
559 
560 
561 
562 
563 
564 
565 
566 
567 
568 
569 
570 
571 
572 
573 

574 

575 

576 
577 
578 
579 

580 

581 

582 
5B3 



Receptors i 

6 2 -adrenergic, ho 190 
B-adrenergic, turkey 69 
-for aslalo-GP, Ll chain 65 
-for " L2chain,rat >153 
-for EGF, hu (HERl) 100 
-for estrogen, hu 232 
-GPIIIa<rel. to integrin), hu 
-for IgE, hu 213 
-for IgAS IgM(epithel) >123 
-for insulin, hu 
-for insulin-like GF-I, hu 
-for interleukin-2(a), hu 159,217 
-for LDL, hu *^80 
-for nerve growth fact, hu >113 
-for PDCF, mu > 138 

-for progesterone, ra >125 
-for SRP 

preproRelaxin, rat *v60 
prepro Benin, hu 44 

Retinol-Bindinq protein, hu «6 1 
Betinol-BP, rat >94 
Retinol-BP-II. rat >55 
Retlnol-BP, Xp >40 
Rhodopsin, bo 96 
Ribonuclease, pane, rat 75 

Ribonucl. reduct. HI, mu >242 
» ■ H2 , rou >62 

>268 



595 
586 
587 
588 
589 
590 
591 
592 
593 
594 



596 
597 
59B 
599 

600 
601 
602 
603 

604 

605 
606 



.Rlbophorin II, hu 
RibQflomal proteins 
rp 614, hu 
rp S16, mu 
rp S19, Xp 
rp Ll, Xp 
rp L14, Xp 
rp L27, nu 
rp L30, mu 
rp L31, rat 
rp L32, mu 
rp L44, hu 

RSV- induced 9E3 protein, ch 77 

Scrapie PrP27-30, ho 90 * 

Seeretogranin I, hu >112 

Seminal vacl protlV.rat 22 

Sodium ch.prot I, rat >251 

Soaatostatin-I, hu 105 
rat 100 
* -ll, anglfsh >59 



>46 

40 
39 



>25 
51 
>83 



-22, catfish 
SorcinAl9, ha 

SPARC, nu embry. endodorm 90 
Stearyl CoA desaturase, rat>102 



tttaacacaaacATGg 
ccatcaaaeacgATGg 
ggcgagacgaagATCg 

aga c tgeg egecATGg 
cgccccgcagccATGg 
cccagtgctatcATGa 
cetagggecatcATGg 
eggggagea gcg ATG c 
cggccacggaccATGa 
gaggcggacgagATCc 
agcaggaccgccATGg 
cagccaccagccATGg 
gctcecgcagccATGg 
caaataaaaggaATGa 
agggtcaggaagATGg 
gaggctgcgagcATGg 
gggaggcgggcgATCg 
agcccggacoccATGg 
gttcaggtcgacATGa 
cctgctgecgceATGc 

gcccagaccgg&ATGt 

actgagggaagcATGg 

ttcctgggcaagATCa 
tctgtccccaaaATGC 
g a g g ccg ce a tc ATGa 
ttgtgaaagaagATGq 

agggccgcagceAUGa 

agcaaagccactATGq 

ettctagcggcgATGc 
ccctcgttcgccATCc 

ctgc tcggaggaATCg 

cgacgtgcagaaATCg 
g tgctcggagctATCc 
atagccggcaagATGa 
agcagegaggagATGg 
acagccgccatcATGg 
tctgccaccgctATCc 
t aaggc nggaagATGg 
gggcccggcagaATGg 
tcaaaaggcatcATGg 
cctgctgcaaagATGg 

acactcctaaccATGa 
agatcagccatcATGg 
ecgagcggggccATGc 
t t ttc tggc aag ATGa 
caggatgacaagATGg 

c gcggcgecgagATGc 
gaggcaggggagATGc 
ccagcagacagtATGc 
gctaccaagaagATGt 

g t ag tc ttcaccATGg 

gttcccagcatcATGa 

ccgacagccacgATGc 



Habcts 'B7 PNAS 84,2421 
Swanson 'B7 KCB 7, 1731 
Theissen 1 B6 EHB0 5 , 3209 

Kobilka '87 PHAS 84 , 46 
yarden 1 86 PNAS 83, 6795 
Leung '85 JBC 260,12523 
McPhsul '87 KCB 7, 1841 
ishli '85 PNAS 82,4920 
Green '86 Nature 320, 134 
Fitzgerald ' 67 JBC 262 ,3936 
Ikuta '87 PSAS 84, 819 
Mostov '84 Nature 308, 37 
Ullrich '85 Nature 313, 756 
Ullrich '86EHB0 5, 2503 
Leonard '85 Sci 230, 633 
Sudhof '85 Sci 228,615 
Johnson "86 0611 47, 545 
Yarden '86 Nature 323, 226 
Loosfelt 'B6 PNAS 83,9045 
Lauffer '85 Nature 318, 334 

Hudson '81 Nature 291,127 

Fukamiiu 'B6 Gene 49, 139 

Cortege '85EMB0 4, 1981 
Shennan '87 PNAS B4, 3209 
Li 'B6 PNAS B3, 5779 
HcKearin '87 JBC 262 ,4939 

Nathans '83 0611 34, 807 

HacDonald'82 JBC 257,14582 

Caras '65 JBC 260, 7015 
Thelander *86 KCB 6, 3433 
Crimaudo 'a7 EMBO 6, 75 



Rhoads '86 HCB 6, 2774 
Wagner '85 HCB 5, 3560 
Amaldi '82 Gene 17,311 
Loreni '85 EHB0 4, 3483 
Beccari '87 NAR 15, 1B70 
Belhuraeur '87 NAR 15, 1019 
Wiedemann ' 84 HCfi 4 , 2518 
Tanaka '87 EJB 162, 45 
Dudov '84 Cell 37, 457 
Davie 8 '86 Gene 45, 183 
Sugano '87 Cell 49,321 
Basler '86 Cell 46, 417 
Benedura '87 EMBO 6, 1203 
Kandala '83 NAR 11, 3169 
Noda *86 Nature 320, 168 
Shen '64 Sci 224,168 
Dixon '84 JBC 259,11798 
Hobart '80 Nature 288,137 
Dixon '82 PNAS 79,5152 
Borot *86 EKBO 5, 3201 
Mason '86 EKB0 5 , 1465 
Thiede ' 86 JBC 261 , 13230 
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607 
609 
609 
610 

611 
612 
613 
614 
61S 
616 
617 
61B 
619 
620 



621 
622 
623 
624 
625 
626 

627 
628 
629 
630 



631 
632 



634 
635 

636 
637 
638 

639 
640 

641 
642 

643 



645 

646 

647 

648 

649 
650 

651 
652 

653 
654 
65S 
656 



proSucrase-isoaaltase, ra 
2-SA synthetase, mu 
Synthetase. hic-tRNA , ha 
t complex protein 1. wu 
T-cell antigen racepton 
Ti a-chain, hu 
Ti B-chaln, hu 
Tl T-chain (CD3), hu 
T- a -chain , nu 
T- Vg chain (3B. 25) , an 
T- V s chain (5.1)* mu 
T- V„ chain, nu 
73 -antigen, if-chain, hu 
T3 -antigen, 4-chain, hu 
T3-antigen, c -chain f hu 

T-cell differentiation antigens t 
CD2 (huaan Til) 

C04 (human T4) >75 

CD4 (rat) >53 

CDS (human Leu-2/T8) >115 

CDS, o-chaln (Lyt-2,»u) 333 
CD8, 37K-chain (rat) 

Other T-cell proteins i 
cytotoxic pT 49 protein, isu 
16K HAL protein, hu >55 
proty-6 , nu 90 
IgE binding factor (soluble) >93 

TachyXinln(neuromedinK)bo 144 * 
Tachykinin (eubstnceP), rat 99 * 
ThxoraboBpondin, hu >120 

Thy-1, tai 78 * 

Thy-l-relatad HBCOX-3, rat 

Thymidine kinase, hu 60 
ha 





caatgaaataagATGg 




tccagacttagcATGg 


77 


ttggcageeaggATGg 


>6Q 


cgtttcctgaagATCg 


>170 


cactg c tcagccATCc 


52, 


tctcactctgccATGg 


>140 


agag aggaagg eATGc 


>5S 


caaggctcagceATGc 




ttctaagccaccATGg 




ctgagaggaagcATGt 




c tacagcag accATGa 




acagagactgacATGg 


95 


ttccgctgcgagATGg 




catgaaacaaagATGc 



Thyaidylate synthase, hu >90 
• m m mu 24, 34, 51 

>177 



41 
41 



proThymoiin-a, hu 
Thymosin-f^, rat 

Thyroglobulln, hu 
Thyroglobulin, bo 
Thyrotropin, 8-«ubunit, bu >89 
» -releasing; hormone Xp>109 
Thyroxine -binding globulin, hu 
Transferrin, hu 50 



TGF-a, hu 
ICF-fll, hu 

Transin, rat 
Transin-2, rat 

Translation factors i 
Elongation factor la, hu 
Elongation factor 2, ha 
eIF2, rat 

cap binding protein, hu 



>841 



>53 
>77 



IB 



ccaacccctaagATGa 
ggcaaggccacaATGa 
aagcaggccaccATGt 
ggggagcgcgtcATGg 
ggagogcacaccATGg 
aagagcgccaagATOc 

cgcactgoaaggATGa 
eagcacgccgtcATGg 
ccttc tc tgaggATGg 
gtaaagtggaaaATGg 

tccacaggcatcATGc 
gcaaaatcca&cATGa 

aacagctccaooATGg 
actcttggcaccATGa 
cccagagcaaggATGg 

cccggaggcgcaATGa 
cgcaoagccgceATGa 
ag cggcgc gaac ATGa 

gcccg-ecgcgccATGc 
gactgctcogttATGc 

gcgtgceccaccATCt 
cttceagceaccATGt 

agggcoaggaaaATGg 
aaggcteocaagMCg 

g ttg ttcaaagcATGa 

ao agcaggaaagATGg 

cttccttccaaaATCt 

cgcacccggaagATCa 

cccgcccgtaaaATGg 
gccgectcccccATCc 

aagccagtggaaATOa 
aaggctgtctctATGg 

ctaaeagocaaaATGg 
ceatecgccactATGg 
atacaettcagaATGc 
gatcgatctaagATCg 



Hunsiker '86 Cell 46, 227 
Ichii ' 86 BAR 14 , 10117 
Tsui '67 HAR 15, 3349 
Hilllson '86 Cell 44, 727 

Vanagi '85 PHAS 82, 3430 
Smith *87 NAR 15, 4991 
Littman '67 Mature 326, 85 
Saito *84 Mature 312, 36 
Goveroan '85 Cell 40, 859 
Chou '87 PHAS 84, 1992 
Carman '86 Cell 45, 733 
Kriasansen '86EMB0 5, 1799 
Tunnacliffe '86 EMBO 5,1245 
Gold * 86 Mature 321. 431 

Sayre ' B7 PNAS 84 , 2941 
Haddon '85 Cell 42, 93 
Clark '87 PHAS 84 , 1649 
Sukhatme '85 Cell 40, S91 
Nakauchi '87 BAR 15, 4337 
Johnson '86 Hature 323, 74 

Koyana '87 PHAS 84, 1609 
Alonso '67 PSAS 84, 1997 
LeClair '86 EKB0 5, 3227 
Martens '85 PHAS 82, 2460 

Kotani '86 PHAS 83, 7074 
Krause '87 PHAS 84, 881 

Dixit '86 PHAS B3 . 5449 
Giguero '85EMB0 4, 2017 
Clark '85EHB04, 113 
Kreidberg '86 KCB 6,2903 
Lewis •86HCB6, 1998 
Herrill '84KCB4, 1769 
Takeishi *85HAR 13, 2035 
Deng '86 JBC261, 16000 
Bergor '86 PHAS 83, 9403 
Korecker '84 PHAS 81,2295 
Chrlstophe '85 MAR 13, 5127 
Kereken '85 Mature 316,647 

Gurr '83 PHAS 80, 2122 

Richter'84EMB0 3,617 

FlinX '86 PHAS 83, 770B 

lucaro '86HAR14, 8692 

Derynck '84 Cell 38, 287 
Derynck *85 Nature 316,701 

Breathnach'B7 HAR 15 , 1139 



Brands '86 EJB 155, 167 
Kohno '86 PHAS 83,4978 
Ernst *87 JBC263, 1206 
Rychllk '87 PHAS 84 , 945 
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N0._ 
657 
656 
659 

660 
661 
662 
663 

664 
665 
666 

667 
668 

669 

670 

671 

672" 

673 

674 

675 

676° 

677 

676 

679 

680 

681 
682 

683 
684 

685 
686 
687 



w w/t . 



Transcription fact. TFI I IA 50 

Triose-P-isomerase, hu 34 
- ch 52 

Tropomyosin, hu fibroblasts 118 
TH30 ol " " >5 ° 
TH-1 , rat fibroblasts >61 
O-Troporayosin, rat muscle 76 

Troponin-I, fast muscle B2 • 
Troponin -T, rat 79 * 

Troponin -C, slow muscle, ch 



Trypslnogen, anionic, ca 
" cat ionic, ca 

a-Tubulin I, CHO 
a -Tubulin II, CHO 
a-Tubulin III, CHO 
a-Tubulin, hu 
8-TubuHn, hu 
6-Tubulin, hu 
6-Tubulin , ch 

83- Tubulin, ch 

84- Tubulin, ch 

85- Tubulin, ch 

Tyrosinase, mu 
Tyr aminotransferase, rat 



14 
29 

MOO 
>100 
>i30 
211 
72 
159 
>87 
40 to 50 
74 
48 

>174 

97 * 



689 
690 
691 
692 
693 
694 
695 
696 
697 

698 
699 

700 
701P 



Tyr hydroxylase, hu *O0 
Tyr hydroxylase, rat 35 

Ubiquitln, hu 100 * 

Ublquitin, ch 63 * 

UDP-glucuronosyl-tr'ase , hu 
■ " steroid- induced, rat>75 
" " 3-HC induced, rat >124 
Uncoupling prot. (brown fat) 177 
major Urinary pr(KUP),mu 60 
Uroporphyrinogen decarb. hu 
Urotenain-I, carp >75 
Uteroglobin, ra 47 
Valosln •precursor", po >143 
Vasoactive intest . peptide 174 * 
vasopressin-neurophysinll 48 
Viaentin, ha 135 
Vinculin, ch >246 

13 
13 



Vitellogenin-II, ch 
Vitellogenin, Xp 

prewhey acidic protein, rat 33 

preXenopsln, Xp >62 



gctgaaggagagATGg 
ctcggctcggccATGg 
gtcgcctccgccATGg 

ccaccgcaggccATGg 
gcgctccgcgccATGg 
cccaecgcagccATGg 
gccaccgccaccATGg 

ate taaag caagATGt 
c ccaccgccac tATGt 
ccctgcccggccATGg 

acttctgccatcATCa 
cagggagca acc ATGa 

cccgtagctaccATGc 
aaagcagcaaccATGc 
ttcctagacaccATGc 
tcccgggaaaacATGc 1 
gccgccgccatcATGa 
taaattttaaccATGa 
gacaccggcatcATGc 
. gccgaagccatcATCa 
tccggccgcaccATGa 
cgggacagcgccATGa 

gcttcgagaagaATGa 2 

gcttcgagaggcATGg 

ccacactgagccATGc 
ccagcttgcactATGc 

taacaggtcaaaATGc 
gg agacgtaaacATGc 

cattgcatcaggATC t 
ttgatttttaagATCc 
ctctctgaaaggATGg 1 

ctccgagccaagATGg 

c tccc taccaaaATGa 

agaca gc tgaccATGg 

cc tgtgtccagcATGa 

cattc tgccaccATGa 

gaagcgcgcgccATGg 

agaggcacagaaATGg 

acccgtgccaggATGC 

gctctccaaaccATGt 

ccegc tgccgccATGc 

ttcaccttcgctATCa 
ttcgccatcaccATGa 

gccgccgacaccATGe 

catttggaaaggATGt 



Too 'B6NAR14, 2187 
Brown '85KCB 5, 1694 
Straus '85 KCB 5, 3497 
MacLeod '85 PNAS 82, 7835 
MacLeod '87 JKB194, 1 
Helfman '85 JDC 260,14440 
RuU-Opazo'87 JBC 262.47S5 

Baldwin '85 PNAS 82,8080- 
Breitbart '86 JMB 189,313 
Putkey 1 87 HCB 7, 1549 
pinsky '85 HCB 5, 2669 



Elliott '86 KCB 6, 906 

Hall '85 HAR 13, 207 
Lewis '85 JMB 1B2, 11 
Lee -83 Cell 33, 477 
Cleveland '81 Nat 289,650 
Sullivan '86 JBC 261 ,13317 
Sullivan * B6 MCB 6 , 4 4 09 

Shibahara '86 NAB 14,2413 
Grange '85 JMB 164 , 347 
Crima '87 Nature 326, 707 
Harrington' 87 NAB 15,2363 

Baker '87 HAR 15, 443 
Bond ' B6 MCB 6, 4602 
Jackson '87 BiochJ. 242,581 
Harding '67 NARlS, 3936 
lyanagi'86 JBC 261,15607 
Ricquier' B6 JBC 261,1487 
Ehahan 1 87 KCB 7 , 1938 
Goossens'86 JBC 261,9625 
Ishida '86 PNAS 83, 308 
Suske 1 83 HAR 11 , 2257 
Roller *87 Nature 325, 542 
Under '87 PNAS 84, 605 
Ruppert *B4 Nature 308, 554 
Quax '83 Cell 35,215 
Price ' 87 Bioch. J . 24 5 , 595 

Gelser '83 JBC 258,9024 
Walker ' 83 EMBO 2, 2271 
Campbell '84 NAR 12, 8685 
Surea 'B4 PNAS 81, 380 



tt A two-latter abbreviation indicates the source of each mRHA: bovine; canine > 
chicken; feline) 2£, guinea pig; hamster; human; murine; porcine; rabbit; 
sheep; Xp , Xenopus. 

b The number of nucleotides comprising the 5' -untranslated sequence is indicated, 
in most cases this was determined by primer extension and/or Si mapping. An en- 
try is marked > if the cDKA included a considerable portion of the S'-noncoding 
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sequence but was probably not complete. There is no entry in this column if the 
cDNA included little of the leader sequence or if the 5' -end of the mRNA was not 
mapped on the genomic sequence. If a second form of mRNA was detected but its 
leader sequence not precisely mapped, the entry is marked +. An asterisk in- 
dicates that the 5'-noncoding sequence 1b interrupted by an intron. 

c upstream ATG codons are designated strong (s) or weak (w) according to context 
(see text). They are listed according to. whether or not the reading frame es- 
tablished by the upstream ATG codon terminates (t) .before the start of the 
major open reading frame. If a gene produces two major transcripts, only one 
of which has upstream ATG codons, the upstream ATG codons are listed in paren- 
theses. ± means that upstream ATG codons occur in only a minor species of 
mRNA. If an upstream ATG codon lies very near the error-prone 5 '-end of a 
cDNA clone and has not yet been confirmed by sequencing either the gene or a 
second cDNA, I have temporarily entered a question mark in this column. 

^Bibliographic data are given in condensed form: first author, year, journal, 
volume, and first page. 

^Alternative splicing produces. transcripts with three different leader sequences, 
only one of which is listed. The second form of mRNA also has three upstream 
ATG codons, while the third has a single weak ATG codon upstream. 

i Eight of the upstream ATG codons lie in the same reading frame, constituting an up- 
stream cistron with the potential to encode a 40 amino acid peptide . In view of the 
pattern of codon usage, Shull and Lingrel postulate that this peptide is made. 

9 it is likely that ri bo somes initiate at the first and second ATG codons in 
these mRNAs, producing long and short forms of the encoded polypeptide. In each 
case the 5' -proximal ATG codon occurs unusually close to the cap and in a sub- 
optimal context for initiation t it is not known which of those features ac- 
counts for the "leakiness. w The distance from the cap to the first ATG codon 
is 5 nucleotides for #143, 7 n for #297 and 3 n for #330. 

^Unlike the human HMG-CoA reductase mRNA sequence which is entered in the table, 
a subset of transcripts from the corresponding hamster gene have upstream ATG codons. 

*The sequence of the chicken insulin gene has a weak ATG codon upstream from 
the translational start site, but it is not known whether the 5 '-end of the 
transcript includes that ATG codon. 

* There is a development ally regulated switch in the promoter for IGF-II in rats. 
The longer transcript introduces an upstream in-frame ATG codon, and initia- 
tion at that site would add eleven amino acids to the N-terrainus of IGF-II. On- 
ly the shorter transcript is represented in the table because the functional ini- 
tiator codon in the longer transcript has not been verified experimentally. 

^a-lnterferons A and D are the most highly expressed in humans and both of 
those mRNAs have A in position -3, as shown. Other human a-lFN genes that are 
expressed at lower levels have C in position -3, but it is not known if that 
substitution accounts for their lower expression. 

^In the major form of NGF mRNA in mouse submaxillary glands, the indicated 
initiator codon is the first ATG triplet in the message. The exon that car- 
ries that ATG codon is spliced out of the major NGF transcript in other tis- 
sues and an ATG codon that lies farther downstream is thereby activated. 

"The first cDNA that was cloned fell short of the real initiator codon, result- 
ing in mis identification of an internal ATG codon as the translational start 
site. 



8147 



209 



Nucleic Acids Research 



Whereas the context around the initiator codon is standard in the a- and &- 
tubulin sequences shown in the table, one a- and one 0- tubulin mKNA have been 
described in which the ATG codon lies in a poor context for initiation (Cowan 
'83 MCB 3,1738; Lee '84 NAR 12, 5823). It is not known whether or how well 
those particular mRNA species are translated. 

'The chicken 03 -tubulin gene produces mRNAs with very heterogeneous 5* -ends, 
some of which would lack the upstream ATG codon. 

'Entries 363 and 401 have been deleted, leaving 699 sequences on which the 
calculations in Tables 1 and 2 are based. This includes all published sequ- 
ences to which I had access as of May 31 , 1987, in which the functional initia- 
tor codon has been clearly identified. Another 110 sequences were excluded 
because of uncertainty about which ATG codon initiates translation) that list 
was made available to the editor during the review of this manuscript. 
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1 80 Tfie Logic and Mach h iery of 'Geite. Expression' 

Figure 3i51 wild type tRNAs 

TrarislaUonal suppression of terminator 
cbdtans/ Mutations causing a change in 
the anticodons oj tRNA 1 * tRNA! :?H , 
or tRNA Scr ^enriiTti^nsbtionW'^rmi- 
nation codoiis as amino acids. 



suppressor 




ribbsbme that are involved in the cbdoriranticodbn interaction!/^ 
certain chemicals stich as axninoglycpsides (e.g.,. streptomycin) bih&to 
ritpsomal proteins in the 3Q$ subunit and can alter the, ffle'Ufp* 
translation. In these cases, there is a more Widespread breakdown in'jtjS 
accuracy of the translation process, 
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3.8 ntRNA Translation in Eukaryotes 

Translation of ;eukaryptic mRN A is basically similar to that of prokar^pffi 
rnjtNA. With the exceptions already noted, the genetic code is identjcL 
arid the codons are translated successively by an^bacyl-tRNAs in conj 
junction: with ribpsornes. There are, however, three notable differences 
imposed by certain characteristics of eukaryotic cells. First, the rransgff 
tiori arid translation machinery in eukaryotes are physically separa[|b 
transcription occurring in the nucleus and translation in the cytppla^g 
Second, the 5' and 3' ends of eukaryotic rriRNAs have special structor§| 
Third, with the exception of the mRN As transcribed, from the DNgl 
genomes of viruses) eukaryotic mRNAs usually contain only a single ptM 
kein. ending sequence. 

At present; we know considerably Mess , about the structures an| 
properties, of the participants in eukaryotic translation than, of' th™ 
counterparts in prbkarybtes. Although the same three stages— initiatipr^ 
elongation, and termination — are discerhable in eukaryotes, each is mope 
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protein, has glvcirie 



3.8 niRNA Translation i/t Btkniyqits 18 1 

ajMfeln the number of extrarjbosomal protein factbrsihat are required. 
W^fcof the differences, prpiein coding sequences from prokaryotes are 
^llSlji^translated'by the eukaryotic.translatipn-system, provided 'that their- 
*fmRNAs possess the appropriate, modifications at the 5' and ,3'" termini 

|||bri 3.8a). Conversely, eukaryotj.c protein coding sequences are 
Jlpted efficiently in prokaryotes, provided they, contain a Shine, 
sequence 5' to the initiator AUG. This .means that the translation 

Si^ry Pf both types of organisms can contend with .the. nucleotide 

IP&e arrangements in mRN As from whatever .source. 

||||cc/fl/ Modifications in Eukwyolic mRNAs 
Jpjlybtic mRNAs transcribed from nuclear or viral DNA genes by RN A 
Soipiefas^ II. always have a modified 5' terminus, referred to as. a "cap" 
l(f^ure'3;53); ButRNAs transcribed by.eukaryotTc RNA polymerases I 
fe?rl^As) .and III (5S and tRNAs) ate .not capped and .retain . their 
fglijp.S'-triplibsphafe termini; Most ofthermRNA produced by animal. 
^^Sinises is also, rapped; even though if is ■produced' by virus encoded 
|fanscriptases ( Most uncapped mRlvJAs are poorly translated by 
j^^J^dtic protein synthesizing ; systems because of inefficient :ribosOme. 

ilmiling' to the-;xnRNA ; Capping Occurs at the" .5' nucleoside' triphosphate mutation causes substitution of ergiriine 



mjssense mutant 




^ajiui:6i}yujy onci lyi^ ^^y^^.^^ ,. V w r ^^. v ^.. before the tanscript 
g|||^ieted. The details of the capping process, are presented 'in Sec- 
l|pH#;3c. * - ■ ■ 

ISliuMybtic/mRNAs also contain a polyadenyiate sequence at their, 3 
f|f||^s 3' ''tail/'' which is 50 to 200 adenylate residues, long, is not 
^^^ied in the: sequence of protein coding genes, but is added posttran- 
1|lp6n;illy after cleavage of the transcript at a spedftcsequence beyond 
|l^itTinsiation termination signal. (Section' 8.3c), 

^JjtitiaHou of Translation by. Small Ribosomal 

^$$^Wiits.al the 5' Capped Ends 6f mRNAs 

J|||^the ^dissociation of '70S ribosomes is an .obligatory step for initiating 
gfltKeistranslation of prokaryotic mRNA. the 80S ribosomes must be 
B|s6aated before the translation of eukaryoHc mRNAs; can begin, The 
NgS&ubuhit (40S), in assodaHon-with an airay . of accessory proteins (elEs), 
^h&orimore of which are : needed to dissociate the ribbsome into, its 
^M^bnent subunits, binds the special injtiatqr met-t^NA^ cl . Here, top, 
P@^-^ri* a special protein— elrV2— are needed to bind:jthe initiator 
^Sminbacyl-tRNA In eukaryotes, however, met-tRNA^ is riot N- 
^Smylated; but as in* prokaryotes, the structure of tRNAp" is different 
piiCtRNAtj''. T*he\40S complex .containing met^RNAf 1 ^ GTP, and a 
^ano^iy-of other elFs binds to the mRKA at or near the capped 5' end; 
t^VlW^ ohedif the factors recognises and binds to the cap structure. The 
IfeShail^ubunit moves from the capped terminus to the first AUG down- 
gl^tr^am from the 5' erid by a mechanism that is still .unknown. At present 
teneS : :.is no requirement for a nucleotide sequence comparable to. the 
Iffpq-Mgamp sequence. Jnepr Jthe AUG, However, the efficiency with 
j|Efi3iicrv an AUG serves as^ an initiator codon is influenced by certain 
%%icjedtides on bb'th sides of the AUG. This p reinitiation complex opnv 



suppressor -mutant 




Suppressor tftNA* 1 .* with altered 
antlc'odbnrecognizes'thB.'codori 
for arginine 

Figure 3.52, 

Trahslatlonal suppression of missense 
mutations. In this example, a tRNA G1 ^ 
acquires: an altered :anticodo.n that 
permit's Insertion of glycine at an argi- 
nine codon, AGA. 
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182 Xhe logfcwd. Machinery of Gmt Eicpr&io'n 




Figure 

The cap structure at the 5' end of most 
eukaryotic mRNAs. In addition to the 
7-methyJ guanylic acid- that h linked to 
the primary' transcript - through a tti- 
■phospSate.'bridge, the^cap frequently 
includes 2'-D-methyl groups on the 
first or. first two ribose residues iri 
the transcript. 



bines with the 005 subunit in an. energy- and factorHdependent:reaciijS 
to create )he functional initiation complex. The/initiation o^cratein^ 
thesis. can be regulated by phosphorylation and -dephosphoiYlaHon:o?S 
eIF-2. • 



t Polypeptide Chain ElqngiiHtin and Termination 

The- stepwise translation of successive codbns with amindacyi-tRNX£" 
riot substantially different injeukarybtes and prokaryptes. GXP an,d;|Ke 
elongatipn .factor, eEF-i, which corresponds, to ; the prokaryofic comp 1 ^ 
of EFrTu and EF-fs, cycle, to bring: ammoaeylTtRN As to the ribospm^ 
GTP and eEF.-2, which is functionally analogous to prokaryotte ;E^6" 
promote the translocation operation, Xerriiinatiort' pf translation^ 
eujcaryotes also occurs at any one of thcthree stop cbdbrisVcausing reje^ 
of free polypeptide chains, tRNA ( and, very likely, the 80S ribospme Wu 
the mRNA. One factor, ?eRF, and GTP appear to mediate the fentireSeri*H 
determination events; ,: '-ffi 
The .biogenesis of eukaryotic miysfA is. discussed In detail in SecHon 
8.3, but here we need to stress -that translation does not occur uhtil ,® 
mature mRNA reaches the cytoplasm, whereupon initiation can occuj^ 
mentioned previously. Polysomes, are formed by successive initiation; 
and simultaneous, multiple translations pf the protein coding sequel 
occur as iri prokaryotes. A notable difference, rtowever, is that celiulaT 
mRN>\s generally contain only a single/protein coding -sequence.. Wfi«£ 
mRNAs contain consecutive coding regions; as occurs with many animal^ 
DNA viruses, the downstream coding.- sequences are inefficiently or'nofc 
translated. 
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Figure 18-17 

Nuclear poresV (a) Blectrpri micrograph 
of part bf a nuclear envelop^ wlm' a 
hummer pf pores, (bj Gross section 
through; tvvo pores {n = nacleuSj e « 
envelope, c =\cytoptosrn)'. (c)..Diagram 
of a pore, showing the eight pairs 61 
granules that line it and a central gran- 
ule; Although yeast-dbes have nuclear 
pores/ detailed electron microscopic 
Studies have npt yet been carried put, 
(Courtesy of K. Roberts, "John Vnnts In- 
stitute> |Qorwich> England,) 
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not >in i^her eiutar^Qtes^ ftev5$;;g;qne is lpeated $os"e tpi^ll^f 1 
urii ts '.aricl is ■ j^a'r t of 7th6 -.same re£ea ting; . unit, even ■ tfto.u'gftjt Jlj^j^ 
. scribed by. a diffe andiri the'bgp^e::d[^^ 

p>(#Use. the; rDNA .yhitglMe Wdiemly-ib'c^ted he^;-tp;ea^^fi|| 
in^mpjecular crossing^ over !bejfyv.e6n- _ them -occasionally ienejatesi 
yeast eelk cpntainingJsmall drck wj^c^ ar^;r^id&||^i 

since tl^ey dp npVp9.sses^;CeniTdm 

^is;inportanr,fpr rnairita^nir*g tl*ie sequence homogeneity ■■off ^ : to|^j^L^ 
vid Uai repea ts. wi thin the larger uni t by mechanisms ih a ! t w!|}^|S^^ 
cussed in Chapter 50. " : ' 

Within ; all .eucaryotic cells/ the acHvely transcWbed rDM^4rJitii^ L 
somehow compacted, into dense^appearing ^nucleolar bo^iisjin^ 
yeaste/&^^ 

thegaiidear membrane. • ' ■ 

Caps at the Ends of Eucaryotic mRNAs^^lMf 

Eucaryotic mRN As; including those of yjeast'^aveiseV^iru-iip^^^g! 
features that. dUferenHate.thein from procarybtic mRNAs, Fira|ii^i|| 
are 'modified at their 5' ends, by the addition of a'.guariirie nudeou||]|j 
This alqne would not be considered unusual if the linlMg6rwe^i|| 
conv^ritionai 3'-5.' phosphodiester bond. Instead; GTP reacts Hp 
the * " * ' ■"' '-" "'" " * % "' *' ""' " " 

. tipn: 

as a A . ....... ,. 

terminated by ribose moieties with free 2 - and 3.VOH groups: Su^jg 
quent to capvformation,- a: methyl. group is added to the bacJ^^f 
guanine residue (at the 7. position of the purine, ring) and often bfe6$g 
the 2'-OH groups of the first and/or second adjacent nucleotides^ 
(Figure;i848).. 

Why do the 5' ends of eucaryotic mRNAs need to be so blqde||^ 
One possible, reason is that these caps (and apparently specific i^r^p 
.teiris; that: bind to them) help ribQSomes attach, <tp rrnRNA chains js^^ 
that tftey start translation at the correct Atld cpdbn; Favorin^th^ 
hypothesis is the absence of-sequences complementary to 18S 
preceding coding regions in eucarybtic.mRNA molecules. SpecifiC'rl^ 
bosome binding sequences arialpgous to those: of procarybtic mRlfesv^ 
: thus do not exist on. eucaryotic mRNA molecules. Instead, eucaryoff^ 
ribospmes search out AUG initiator codons by binding to the ca'p!sj| 
and then migrating to the closest (5'-mps.t proximal) AUG, epdon^|^ 
tslation. Apparent exceptions to this rule are several vitm^ 



start translation. Apparent exceptions 
mRNAs (e.g., polio RN A) that function perfectly normally in eueafyr|| 
otic cells but lack the. 5' :cap stniCture. Their ends are blocked irjisWd^| 
by "specific proteins that perhaps substitute- as positioning agertl?1i| 
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•Figure lMfe 

A generalized structure; for .eU^rJprW 
mRNA, -showmg the pm&ar^Snonal 
modifies Uoris at the 5' arid 3' ends 



; %ast mRNA Molecules Code for 
llpiWglei (Never Multiple) Polypeptide Chains 54 - 55 

f^venyeast mRNA molecule, like most other eucarybfic mRNAs 
,^ eS the.5ene.hc messagS /<* p nl y a single polypepHde crSrt Uri 

^several sKesto give rise lo several independent »SS 
agucaryptic mRNAs ate constructed so LitrnnS^^t 
fe« s .at the first AUG codon fpUovvinglthe cappVd 5' end 
b#ns.w n be discussed in Chapter i4.)l,u 5 , S ^fe 

, a r teria - ™* ***** *RNAs, however, eucarvotic 
I&t^ }un * ^veunfranslated^S^ 

Jjg^!*"- ^ P^n-codirig regions (leader anl trailed 

Itj^abiliry.o^ucaryotic mRNAs to encode multiple proteins arid 
plbereby ensure their wordmate regulation probably, eiflalns Whv 
fe££ f^^oHc polypeptide, corifi* of^eraJ Sit' 
^Jtiams having related enzymatic functions. The yeast H/S4mftNA it 
PefA ^ *" a *at carrierouSwfe^ 

|~ a hc sf eps ir. histidine bipsynthes&.Yet, It is rare Tat a S 
^Pephde carries put all the steps Involved In.o rivfT^ffi 

^lnZ y ' ^r,' coordinaK,^S 
g|on of several funcBonally related mRNAs must** attrndtaS 

gSf^ U f i T B * Mrt tonPte fhat this:)atter /effiS 
$tE&^°— ^P 1 * *» *« carry ^ out Z 

&%$S1 . °{ arglmne m ^ ^ a f« encoded by several different 

gthan bemg regulated by a single pblydstronic mRNA ' 



|toly A at the 3' Ends of Eucaryotic mRNAs 56 -* 1 ) 

li^T^X?"* ° b ' erVati °" *t^^>t:mRNA.moie. 
»d^^*?*^TT , * <ell, - ^Hvely long 

^stretches (about 200 res.dues) of poly Aat.their3' ends. These poly A 
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